Overview

Brought to you by YData

Dataset statistics

Number of variables 116
Number of observations 7000490
Missing cells 693626847
Missing cells (%) 85.4%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 6.1 GiB
Average record size in memory 928.0 B

Variable types

Text 25
Numeric 46
Categorical 44
DateTime 1

Alerts

meddra_concept_class_id_1 has constant value "PT" Constant
meddra_concept_class_id_2 has constant value "HLT" Constant
meddra_concept_class_id_3 has constant value "HLGT" Constant
meddra_concept_class_id_4 has constant value "SOC" Constant
relationship_id_12 has constant value "Is a" Constant
relationship_id_23 has constant value "Is a" Constant
relationship_id_34 has constant value "Is a" Constant
ATC_concept_class_id has constant value "ATC 5th" Constant
MedDRA_concept_class_id has constant value "PT" Constant
gt_null_99 is highly imbalanced (74.8%) Imbalance
max_score_nichd is highly imbalanced (55.5%) Imbalance
table_name is highly imbalanced (51.7%) Imbalance
XB is highly imbalanced (75.8%) Imbalance
XD is highly imbalanced (68.7%) Imbalance
XG is highly imbalanced (78.0%) Imbalance
XH is highly imbalanced (52.1%) Imbalance
XM is highly imbalanced (72.4%) Imbalance
XP is highly imbalanced (83.9%) Imbalance
XR is highly imbalanced (68.5%) Imbalance
XS is highly imbalanced (92.7%) Imbalance
XV is highly imbalanced (88.7%) Imbalance
action is highly imbalanced (52.9%) Imbalance
pediatric_adverse_event is highly imbalanced (74.6%) Imbalance
ade has 849821 (12.1%) missing values Missing
atc_concept_id has 904638 (12.9%) missing values Missing
meddra_concept_id has 901107 (12.9%) missing values Missing
cluster_id has 6539667 (93.4%) missing values Missing
gt_null_statistic has 6539667 (93.4%) missing values Missing
gt_null_99 has 6539667 (93.4%) missing values Missing
max_score_nichd has 6539667 (93.4%) missing values Missing
cluster_name has 6539667 (93.4%) missing values Missing
ade_nreports has 6539667 (93.4%) missing values Missing
nichd has 562425 (8.0%) missing values Missing
gam_score has 3704631 (52.9%) missing values Missing
norm has 3774631 (53.9%) missing values Missing
gam_score_se has 3774631 (53.9%) missing values Missing
gam_score_90mse has 3774631 (53.9%) missing values Missing
gam_score_90pse has 3774631 (53.9%) missing values Missing
D has 3774631 (53.9%) missing values Missing
E has 3774631 (53.9%) missing values Missing
DE has 3774631 (53.9%) missing values Missing
ade_name has 3707027 (53.0%) missing values Missing
category has 6378728 (91.1%) missing values Missing
atc_concept_name has 6310612 (90.1%) missing values Missing
meddra_concept_name has 6311700 (90.2%) missing values Missing
atc_concept_class_id has 6421685 (91.7%) missing values Missing
meddra_concept_class_id has 6378728 (91.1%) missing values Missing
a has 6378728 (91.1%) missing values Missing
b has 6378728 (91.1%) missing values Missing
c has 6378728 (91.1%) missing values Missing
d has 6378728 (91.1%) missing values Missing
lwr has 6378728 (91.1%) missing values Missing
odds_ratio has 6762567 (96.6%) missing values Missing
upr has 6762567 (96.6%) missing values Missing
pvalue has 6378728 (91.1%) missing values Missing
fdr has 6184674 (88.3%) missing values Missing
null_99 has 7000483 (> 99.9%) missing values Missing
safetyreportid has 4674107 (66.8%) missing values Missing
sex has 4674107 (66.8%) missing values Missing
reporter_qualification has 4674107 (66.8%) missing values Missing
receive_date has 4674107 (66.8%) missing values Missing
XA has 4674107 (66.8%) missing values Missing
XB has 4674107 (66.8%) missing values Missing
XC has 4674107 (66.8%) missing values Missing
XD has 4674107 (66.8%) missing values Missing
XG has 4674107 (66.8%) missing values Missing
XH has 4674107 (66.8%) missing values Missing
XJ has 4674107 (66.8%) missing values Missing
XL has 4674107 (66.8%) missing values Missing
XM has 4674107 (66.8%) missing values Missing
XN has 4674107 (66.8%) missing values Missing
XP has 4674107 (66.8%) missing values Missing
XR has 4674107 (66.8%) missing values Missing
XS has 4674107 (66.8%) missing values Missing
XV has 4674107 (66.8%) missing values Missing
polypharmacy has 4674107 (66.8%) missing values Missing
atc1_concept_name has 6999412 (> 99.9%) missing values Missing
raw_code has 7000475 (> 99.9%) missing values Missing
gene_symbol has 6792831 (97.0%) missing values Missing
type has 6987933 (99.8%) missing values Missing
soc has 6933227 (99.0%) missing values Missing
auroc has 7000255 (> 99.9%) missing values Missing
wt_pvalue has 7000255 (> 99.9%) missing values Missing
ttest_statistic has 7000255 (> 99.9%) missing values Missing
ttest_pvalue has 7000255 (> 99.9%) missing values Missing
atc_concept_code has 6932374 (99.0%) missing values Missing
ndrugreports has 6999402 (> 99.9%) missing values Missing
atc4_concept_name has 6999402 (> 99.9%) missing values Missing
atc4_concept_code has 6999402 (> 99.9%) missing values Missing
atc3_concept_name has 6999402 (> 99.9%) missing values Missing
atc3_concept_code has 6999402 (> 99.9%) missing values Missing
atc2_concept_name has 6999402 (> 99.9%) missing values Missing
atc2_concept_code has 6999402 (> 99.9%) missing values Missing
atc1_concept_code has 6999426 (> 99.9%) missing values Missing
drugbank_id has 6988168 (99.8%) missing values Missing
id has 6988168 (99.8%) missing values Missing
action has 6988168 (99.8%) missing values Missing
uniprot_id has 6988168 (99.8%) missing values Missing
entrez_id has 6988168 (99.8%) missing values Missing
meddra_concept_name_4 has 6983552 (99.8%) missing values Missing
neventreports has 6983549 (99.8%) missing values Missing
meddra_concept_class_id_1 has 6983552 (99.8%) missing values Missing
meddra_concept_class_id_2 has 6983552 (99.8%) missing values Missing
meddra_concept_class_id_3 has 6983552 (99.8%) missing values Missing
meddra_concept_class_id_4 has 6983552 (99.8%) missing values Missing
meddra_concept_code_1 has 6983552 (99.8%) missing values Missing
meddra_concept_code_2 has 6983552 (99.8%) missing values Missing
meddra_concept_code_3 has 6983552 (99.8%) missing values Missing
meddra_concept_code_4 has 6983552 (99.8%) missing values Missing
meddra_concept_id_2 has 6983552 (99.8%) missing values Missing
meddra_concept_id_3 has 6983552 (99.8%) missing values Missing
meddra_concept_id_4 has 6983552 (99.8%) missing values Missing
meddra_concept_name_1 has 6983549 (99.8%) missing values Missing
meddra_concept_name_2 has 6983549 (99.8%) missing values Missing
meddra_concept_name_3 has 6983549 (99.8%) missing values Missing
relationship_id_12 has 6983552 (99.8%) missing values Missing
relationship_id_23 has 6983552 (99.8%) missing values Missing
relationship_id_34 has 6983552 (99.8%) missing values Missing
soc_category has 6983649 (99.8%) missing values Missing
pediatric_adverse_event has 6983549 (99.8%) missing values Missing
probe has 6805388 (97.2%) missing values Missing
sample has 6806436 (97.2%) missing values Missing
actual has 6806436 (97.2%) missing values Missing
prediction has 6806436 (97.2%) missing values Missing
residual has 6806436 (97.2%) missing values Missing
f_statistic has 6806436 (97.2%) missing values Missing
f_pvalue has 6806436 (97.2%) missing values Missing
ATC_concept_class_id has 6999914 (> 99.9%) missing values Missing
ATC_concept_id has 6999914 (> 99.9%) missing values Missing
ATC_concept_name has 6999914 (> 99.9%) missing values Missing
Control has 6999914 (> 99.9%) missing values Missing
MedDRA_concept_class_id has 6999914 (> 99.9%) missing values Missing
MedDRA_concept_id has 6999914 (> 99.9%) missing values Missing
MedDRA_concept_name has 6999914 (> 99.9%) missing values Missing
condition_name has 6998141 (> 99.9%) missing values Missing
control has 6998141 (> 99.9%) missing values Missing
stitch_id has 6933462 (99.0%) missing values Missing
medgen_id has 6933462 (99.0%) missing values Missing
ade_nreports is highly skewed (γ1 = 56.06041929) Skewed
gam_score is highly skewed (γ1 = -68.47906826) Skewed
gam_score_se is highly skewed (γ1 = 635.0719086) Skewed
gam_score_90mse is highly skewed (γ1 = -634.9143922) Skewed
gam_score_90pse is highly skewed (γ1 = 635.2238839) Skewed
DE is highly skewed (γ1 = 168.5327063) Skewed
a is highly skewed (γ1 = 48.91934858) Skewed
c is highly skewed (γ1 = 46.18078914) Skewed
null_99 is uniformly distributed Uniform
ATC_concept_id is uniformly distributed Uniform
ATC_concept_name is uniformly distributed Uniform
norm has 460837 (6.6%) zeros Zeros
D has 126892 (1.8%) zeros Zeros
E has 538729 (7.7%) zeros Zeros
DE has 2453256 (35.0%) zeros Zeros
c has 383839 (5.5%) zeros Zeros
XA has 1621810 (23.2%) zeros Zeros
XC has 1931835 (27.6%) zeros Zeros
XJ has 1775841 (25.4%) zeros Zeros
XL has 1426884 (20.4%) zeros Zeros
XN has 1263277 (18.0%) zeros Zeros

Reproduction

Analysis started 2025-04-28 13:41:23.997215
Analysis finished 2025-04-28 14:08:42.628088
Duration 27 minutes and 18.63 seconds
Software version ydata-profiling vv4.16.1
Download configuration config.json

Variables

ade
Text

Missing 

Distinct 461234
Distinct (%) 7.5%
Missing 849821
Missing (%) 12.1%
Memory size 53.4 MiB
2025-04-28T21:08:43.726980 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 17
Median length 17
Mean length 16.991213
Min length 14

Characters and Unicode

Total characters 104507328
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 395 ?
Unique (%) < 0.1%

Sample

1st row 1588648_35809076
2nd row 1588648_36315755
3rd row 1588648_36416514
4th row 1588648_37019318
5th row 1588648_37019399
Value Count Frequency (%)
21602295_37320158 3851
 
0.1%
21604559_35506601 3801
 
0.1%
21602295_36718382 3346
 
0.1%
21602295_37320109 3311
 
0.1%
21604559_37522220 2626
 
< 0.1%
21602295_37320257 2123
 
< 0.1%
21602256_35809005 2067
 
< 0.1%
21603911_35809005 2030
 
< 0.1%
21602295_37320170 2009
 
< 0.1%
21604757_35809304 1906
 
< 0.1%
Other values (461224) 6123599
99.6%
2025-04-28T21:08:44.427169 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
1 14580910
14.0%
0 13975096
13.4%
6 12902033
12.3%
3 12738669
12.2%
2 12481752
11.9%
5 7206028
6.9%
4 6697247
6.4%
_ 6150669
5.9%
9 6038455
5.8%
7 5974891
5.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 104507328
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
1 14580910
14.0%
0 13975096
13.4%
6 12902033
12.3%
3 12738669
12.2%
2 12481752
11.9%
5 7206028
6.9%
4 6697247
6.4%
_ 6150669
5.9%
9 6038455
5.8%
7 5974891
5.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 104507328
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
1 14580910
14.0%
0 13975096
13.4%
6 12902033
12.3%
3 12738669
12.2%
2 12481752
11.9%
5 7206028
6.9%
4 6697247
6.4%
_ 6150669
5.9%
9 6038455
5.8%
7 5974891
5.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 104507328
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
1 14580910
14.0%
0 13975096
13.4%
6 12902033
12.3%
3 12738669
12.2%
2 12481752
11.9%
5 7206028
6.9%
4 6697247
6.4%
_ 6150669
5.9%
9 6038455
5.8%
7 5974891
5.7%

atc_concept_id
Real number (ℝ)

Missing 

Distinct 2038
Distinct (%) < 0.1%
Missing 904638
Missing (%) 12.9%
Infinite 0
Infinite (%) 0.0%
Mean 22018887
Minimum 1123609
Maximum 45893529
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:44.626128 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1123609
5-th percentile 21600448
Q1 21601911
median 21603312
Q3 21604305
95-th percentile 21604762
Maximum 45893529
Range 44769920
Interquartile range (IQR) 2394

Descriptive statistics

Standard deviation 3067785.4
Coefficient of variation (CV) 0.13932518
Kurtosis 41.932231
Mean 22018887
Median Absolute Deviation (MAD) 1067
Skewness 5.9558132
Sum 1.3422388 × 1014
Variance 9.411307 × 1012
Monotonicity Not monotonic
2025-04-28T21:08:44.820126 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
21603929 80845
 
1.2%
21601423 76951
 
1.1%
21602256 73173
 
1.0%
21604344 59193
 
0.8%
21603911 58962
 
0.8%
21602735 58691
 
0.8%
21603967 56175
 
0.8%
21604757 53428
 
0.8%
21603908 51997
 
0.7%
21604559 51615
 
0.7%
Other values (2028) 5474822
78.2%
(Missing) 904638
 
12.9%
Value Count Frequency (%)
1123609 1
 
< 0.1%
1123610 1
 
< 0.1%
1123611 1
 
< 0.1%
1123612 7
< 0.1%
1123619 1
 
< 0.1%
1123622 1
 
< 0.1%
1123630 3
< 0.1%
1123633 1
 
< 0.1%
1123646 2
 
< 0.1%
1123673 7
< 0.1%
Value Count Frequency (%)
45893529 3
 
< 0.1%
45893522 1
 
< 0.1%
45893508 2
 
< 0.1%
45893498 15
 
< 0.1%
45893497 913
 
< 0.1%
45893494 1
 
< 0.1%
45893489 232
 
< 0.1%
45893488 6943
0.1%
45893485 8
 
< 0.1%
45893482 1
 
< 0.1%

meddra_concept_id
Real number (ℝ)

Missing 

Distinct 10774
Distinct (%) 0.2%
Missing 901107
Missing (%) 12.9%
Infinite 0
Infinite (%) 0.0%
Mean 36484980
Minimum 788090
Maximum 46277190
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:45.006088 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 788090
5-th percentile 35204948
Q1 35808914
median 36211677
Q3 36818682
95-th percentile 42889163
Maximum 46277190
Range 45489100
Interquartile range (IQR) 1009768

Descriptive statistics

Standard deviation 2868748.5
Coefficient of variation (CV) 0.078628205
Kurtosis 93.491362
Mean 36484980
Median Absolute Deviation (MAD) 503912
Skewness -6.6628983
Sum 2.2253587 × 1014
Variance 8.229718 × 1012
Monotonicity Not monotonic
2025-04-28T21:08:45.187053 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
37522220 41704
 
0.6%
35809327 38553
 
0.6%
35809054 37057
 
0.5%
35708208 36785
 
0.5%
35708202 29745
 
0.4%
36718132 27832
 
0.4%
35708093 24694
 
0.4%
35708154 22279
 
0.3%
35809243 21828
 
0.3%
35205038 21718
 
0.3%
Other values (10764) 5797188
82.8%
(Missing) 901107
 
12.9%
Value Count Frequency (%)
788090 10
 
< 0.1%
788094 252
< 0.1%
788095 32
 
< 0.1%
788096 29
 
< 0.1%
788098 94
 
< 0.1%
788100 94
 
< 0.1%
788104 75
 
< 0.1%
788105 102
 
< 0.1%
788115 342
< 0.1%
788120 214
< 0.1%
Value Count Frequency (%)
46277190 29
 
< 0.1%
46277169 47
 
< 0.1%
46277163 30
 
< 0.1%
46276846 39
 
< 0.1%
46276844 167
< 0.1%
46276840 29
 
< 0.1%
46276826 19
 
< 0.1%
46276825 92
< 0.1%
46276824 20
 
< 0.1%
46276815 47
 
< 0.1%

cluster_id
Categorical

Missing 

Distinct 4
Distinct (%) < 0.1%
Missing 6539667
Missing (%) 93.4%
Memory size 53.4 MiB
2.0
300687 
4.0
137004 
1.0
 
20127
3.0
 
3005

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 1382469
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 2.0
2nd row 2.0
3rd row 2.0
4th row 2.0
5th row 2.0

Common Values

Value Count Frequency (%)
2.0 300687
 
4.3%
4.0 137004
 
2.0%
1.0 20127
 
0.3%
3.0 3005
 
< 0.1%
(Missing) 6539667
93.4%

Length

2025-04-28T21:08:45.349193 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:45.485479 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
2.0 300687
65.2%
4.0 137004
29.7%
1.0 20127
 
4.4%
3.0 3005
 
0.7%

Most occurring characters

Value Count Frequency (%)
. 460823
33.3%
0 460823
33.3%
2 300687
21.7%
4 137004
 
9.9%
1 20127
 
1.5%
3 3005
 
0.2%

Most occurring categories

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
. 460823
33.3%
0 460823
33.3%
2 300687
21.7%
4 137004
 
9.9%
1 20127
 
1.5%
3 3005
 
0.2%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
. 460823
33.3%
0 460823
33.3%
2 300687
21.7%
4 137004
 
9.9%
1 20127
 
1.5%
3 3005
 
0.2%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
. 460823
33.3%
0 460823
33.3%
2 300687
21.7%
4 137004
 
9.9%
1 20127
 
1.5%
3 3005
 
0.2%

gt_null_statistic
Categorical

Missing 

Distinct 2
Distinct (%) < 0.1%
Missing 6539667
Missing (%) 93.4%
Memory size 53.4 MiB
0.0
307907 
1.0
152916 

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 1382469
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Common Values

Value Count Frequency (%)
0.0 307907
 
4.4%
1.0 152916
 
2.2%
(Missing) 6539667
93.4%

Length

2025-04-28T21:08:45.616967 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:45.737881 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 307907
66.8%
1.0 152916
33.2%

Most occurring characters

Value Count Frequency (%)
0 768730
55.6%
. 460823
33.3%
1 152916
 
11.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 768730
55.6%
. 460823
33.3%
1 152916
 
11.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 768730
55.6%
. 460823
33.3%
1 152916
 
11.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 768730
55.6%
. 460823
33.3%
1 152916
 
11.1%

gt_null_99
Categorical

Imbalance  Missing 

Distinct 2
Distinct (%) < 0.1%
Missing 6539667
Missing (%) 93.4%
Memory size 53.4 MiB
0.0
441385 
1.0
 
19438

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 1382469
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 1.0
3rd row 0.0
4th row 0.0
5th row 1.0

Common Values

Value Count Frequency (%)
0.0 441385
 
6.3%
1.0 19438
 
0.3%
(Missing) 6539667
93.4%

Length

2025-04-28T21:08:45.871378 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:45.992451 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 441385
95.8%
1.0 19438
 
4.2%

Most occurring characters

Value Count Frequency (%)
0 902208
65.3%
. 460823
33.3%
1 19438
 
1.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 902208
65.3%
. 460823
33.3%
1 19438
 
1.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 902208
65.3%
. 460823
33.3%
1 19438
 
1.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1382469
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 902208
65.3%
. 460823
33.3%
1 19438
 
1.4%

max_score_nichd
Categorical

Imbalance  Missing 

Distinct 7
Distinct (%) < 0.1%
Missing 6539667
Missing (%) 93.4%
Memory size 53.4 MiB
late_adolescence
302433 
term_neonatal
134834 
early_childhood
 
7324
middle_childhood
 
5849
toddler
 
5320
Other values (2)
 
5063

Length

Max length 17
Median length 16
Mean length 14.956643
Min length 7

Characters and Unicode

Total characters 6892365
Distinct characters 16
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row late_adolescence
2nd row late_adolescence
3rd row late_adolescence
4th row late_adolescence
5th row late_adolescence

Common Values

Value Count Frequency (%)
late_adolescence 302433
 
4.3%
term_neonatal 134834
 
1.9%
early_childhood 7324
 
0.1%
middle_childhood 5849
 
0.1%
toddler 5320
 
0.1%
infancy 2616
 
< 0.1%
early_adolescence 2447
 
< 0.1%
(Missing) 6539667
93.4%

Length

2025-04-28T21:08:46.116656 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:46.253938 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
late_adolescence 302433
65.6%
term_neonatal 134834
29.3%
early_childhood 7324
 
1.6%
middle_childhood 5849
 
1.3%
toddler 5320
 
1.2%
infancy 2616
 
0.6%
early_adolescence 2447
 
0.5%

Most occurring characters

Value Count Frequency (%)
e 1507681
21.9%
a 889368
12.9%
l 776260
11.3%
c 625549
9.1%
n 579780
 
8.4%
t 577421
 
8.4%
o 471380
 
6.8%
_ 452887
 
6.6%
d 353564
 
5.1%
s 304880
 
4.4%
Other values (6) 353595
 
5.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6892365
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 1507681
21.9%
a 889368
12.9%
l 776260
11.3%
c 625549
9.1%
n 579780
 
8.4%
t 577421
 
8.4%
o 471380
 
6.8%
_ 452887
 
6.6%
d 353564
 
5.1%
s 304880
 
4.4%
Other values (6) 353595
 
5.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6892365
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 1507681
21.9%
a 889368
12.9%
l 776260
11.3%
c 625549
9.1%
n 579780
 
8.4%
t 577421
 
8.4%
o 471380
 
6.8%
_ 452887
 
6.6%
d 353564
 
5.1%
s 304880
 
4.4%
Other values (6) 353595
 
5.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6892365
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 1507681
21.9%
a 889368
12.9%
l 776260
11.3%
c 625549
9.1%
n 579780
 
8.4%
t 577421
 
8.4%
o 471380
 
6.8%
_ 452887
 
6.6%
d 353564
 
5.1%
s 304880
 
4.4%
Other values (6) 353595
 
5.1%

cluster_name
Categorical

Missing 

Distinct 4
Distinct (%) < 0.1%
Missing 6539667
Missing (%) 93.4%
Memory size 53.4 MiB
Increase
300687 
Decrease
137004 
Plateau
 
20127
Inverse Plateau
 
3005

Length

Max length 15
Median length 8
Mean length 8.0019704
Min length 7

Characters and Unicode

Total characters 3687492
Distinct characters 14
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Increase
2nd row Increase
3rd row Increase
4th row Increase
5th row Increase

Common Values

Value Count Frequency (%)
Increase 300687
 
4.3%
Decrease 137004
 
2.0%
Plateau 20127
 
0.3%
Inverse Plateau 3005
 
< 0.1%
(Missing) 6539667
93.4%

Length

2025-04-28T21:08:46.434539 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:46.572661 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
increase 300687
64.8%
decrease 137004
29.5%
plateau 23132
 
5.0%
inverse 3005
 
0.6%

Most occurring characters

Value Count Frequency (%)
e 1041528
28.2%
a 483955
13.1%
r 440696
12.0%
s 440696
12.0%
c 437691
11.9%
n 303692
 
8.2%
I 303692
 
8.2%
D 137004
 
3.7%
P 23132
 
0.6%
l 23132
 
0.6%
Other values (4) 52274
 
1.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 3687492
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 1041528
28.2%
a 483955
13.1%
r 440696
12.0%
s 440696
12.0%
c 437691
11.9%
n 303692
 
8.2%
I 303692
 
8.2%
D 137004
 
3.7%
P 23132
 
0.6%
l 23132
 
0.6%
Other values (4) 52274
 
1.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 3687492
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 1041528
28.2%
a 483955
13.1%
r 440696
12.0%
s 440696
12.0%
c 437691
11.9%
n 303692
 
8.2%
I 303692
 
8.2%
D 137004
 
3.7%
P 23132
 
0.6%
l 23132
 
0.6%
Other values (4) 52274
 
1.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 3687492
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 1041528
28.2%
a 483955
13.1%
r 440696
12.0%
s 440696
12.0%
c 437691
11.9%
n 303692
 
8.2%
I 303692
 
8.2%
D 137004
 
3.7%
P 23132
 
0.6%
l 23132
 
0.6%
Other values (4) 52274
 
1.4%

ade_nreports
Real number (ℝ)

Missing  Skewed 

Distinct 511
Distinct (%) 0.1%
Missing 6539667
Missing (%) 93.4%
Infinite 0
Infinite (%) 0.0%
Mean 5.0482984
Minimum 1
Maximum 3841
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:46.739539 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
median 1
Q3 3
95-th percentile 17
Maximum 3841
Range 3840
Interquartile range (IQR) 2

Descriptive statistics

Standard deviation 23.349784
Coefficient of variation (CV) 4.6252782
Kurtosis 6374.056
Mean 5.0482984
Median Absolute Deviation (MAD) 0
Skewness 56.060419
Sum 2326372
Variance 545.21243
Monotonicity Not monotonic
2025-04-28T21:08:46.931720 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 238889
 
3.4%
2 74157
 
1.1%
3 36262
 
0.5%
4 21904
 
0.3%
5 15331
 
0.2%
6 11161
 
0.2%
7 8472
 
0.1%
8 6574
 
0.1%
9 5210
 
0.1%
10 4335
 
0.1%
Other values (501) 38528
 
0.6%
(Missing) 6539667
93.4%
Value Count Frequency (%)
1 238889
3.4%
2 74157
 
1.1%
3 36262
 
0.5%
4 21904
 
0.3%
5 15331
 
0.2%
6 11161
 
0.2%
7 8472
 
0.1%
8 6574
 
0.1%
9 5210
 
0.1%
10 4335
 
0.1%
Value Count Frequency (%)
3841 1
< 0.1%
3791 1
< 0.1%
3338 1
< 0.1%
3302 1
< 0.1%
2618 1
< 0.1%
2115 1
< 0.1%
2059 1
< 0.1%
2015 1
< 0.1%
2000 1
< 0.1%
1898 1
< 0.1%

table_name
Categorical

Imbalance 

Distinct 16
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 53.4 MiB
ade_nichd
3225859 
ade_raw
2326383 
ade_nichd_enrichment
621762 
ade
460823 
gene_expression
 
194054
Other values (11)
 
171609

Length

Max length 46
Median length 21
Mean length 9.153352
Min length 3

Characters and Unicode

Total characters 64077949
Distinct characters 24
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ade
2nd row ade
3rd row ade
4th row ade
5th row ade

Common Values

Value Count Frequency (%)
ade_nichd 3225859
46.1%
ade_raw 2326383
33.2%
ade_nichd_enrichment 621762
 
8.9%
ade 460823
 
6.6%
gene_expression 194054
 
2.8%
ade_null_distribution 70000
 
1.0%
sider 67028
 
1.0%
event 16941
 
0.2%
drug_gene 12322
 
0.2%
ryan 2349
 
< 0.1%
Other values (6) 2969
 
< 0.1%

Length

2025-04-28T21:08:47.114241 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
ade_nichd 3225859
46.1%
ade_raw 2326383
33.2%
ade_nichd_enrichment 621762
 
8.9%
ade 460823
 
6.6%
gene_expression 194054
 
2.8%
ade_null_distribution 70000
 
1.0%
sider 67028
 
1.0%
event 16941
 
0.2%
drug_gene 12322
 
0.2%
ryan 2349
 
< 0.1%
Other values (6) 2969
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
d 10702893
16.7%
a 9034081
14.1%
e 8853399
13.8%
_ 7143354
11.1%
n 5652860
8.8%
i 4941981
7.7%
c 4469633
7.0%
h 4469383
7.0%
r 3296517
 
5.1%
w 2326398
 
3.6%
Other values (14) 3187450
 
5.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 64077949
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
d 10702893
16.7%
a 9034081
14.1%
e 8853399
13.8%
_ 7143354
11.1%
n 5652860
8.8%
i 4941981
7.7%
c 4469633
7.0%
h 4469383
7.0%
r 3296517
 
5.1%
w 2326398
 
3.6%
Other values (14) 3187450
 
5.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 64077949
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
d 10702893
16.7%
a 9034081
14.1%
e 8853399
13.8%
_ 7143354
11.1%
n 5652860
8.8%
i 4941981
7.7%
c 4469633
7.0%
h 4469383
7.0%
r 3296517
 
5.1%
w 2326398
 
3.6%
Other values (14) 3187450
 
5.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 64077949
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
d 10702893
16.7%
a 9034081
14.1%
e 8853399
13.8%
_ 7143354
11.1%
n 5652860
8.8%
i 4941981
7.7%
c 4469633
7.0%
h 4469383
7.0%
r 3296517
 
5.1%
w 2326398
 
3.6%
Other values (14) 3187450
 
5.0%

nichd
Categorical

Missing 

Distinct 7
Distinct (%) < 0.1%
Missing 562425
Missing (%) 8.0%
Memory size 53.4 MiB
early_adolescence
1530492 
middle_childhood
1082064 
late_adolescence
1018469 
early_childhood
819688 
infancy
686823 
Other values (2)
1300529 

Length

Max length 17
Median length 16
Mean length 13.915626
Min length 7

Characters and Unicode

Total characters 89589706
Distinct characters 16
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row term_neonatal
2nd row infancy
3rd row toddler
4th row early_childhood
5th row middle_childhood

Common Values

Value Count Frequency (%)
early_adolescence 1530492
21.9%
middle_childhood 1082064
15.5%
late_adolescence 1018469
14.5%
early_childhood 819688
11.7%
infancy 686823
9.8%
toddler 674524
9.6%
term_neonatal 626005
8.9%
(Missing) 562425
 
8.0%

Length

2025-04-28T21:08:47.261889 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:47.419679 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
early_adolescence 1530492
23.8%
middle_childhood 1082064
16.8%
late_adolescence 1018469
15.8%
early_childhood 819688
12.7%
infancy 686823
10.7%
toddler 674524
10.5%
term_neonatal 626005
9.7%

Most occurring characters

Value Count Frequency (%)
e 14024130
15.7%
l 10201955
11.4%
d 9865641
11.0%
a 7856443
8.8%
c 7686497
8.6%
o 7652994
8.5%
n 5174617
 
5.8%
_ 5076718
 
5.7%
h 3803504
 
4.2%
i 3670639
 
4.1%
Other values (6) 14576568
16.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 89589706
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 14024130
15.7%
l 10201955
11.4%
d 9865641
11.0%
a 7856443
8.8%
c 7686497
8.6%
o 7652994
8.5%
n 5174617
 
5.8%
_ 5076718
 
5.7%
h 3803504
 
4.2%
i 3670639
 
4.1%
Other values (6) 14576568
16.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 89589706
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 14024130
15.7%
l 10201955
11.4%
d 9865641
11.0%
a 7856443
8.8%
c 7686497
8.6%
o 7652994
8.5%
n 5174617
 
5.8%
_ 5076718
 
5.7%
h 3803504
 
4.2%
i 3670639
 
4.1%
Other values (6) 14576568
16.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 89589706
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 14024130
15.7%
l 10201955
11.4%
d 9865641
11.0%
a 7856443
8.8%
c 7686497
8.6%
o 7652994
8.5%
n 5174617
 
5.8%
_ 5076718
 
5.7%
h 3803504
 
4.2%
i 3670639
 
4.1%
Other values (6) 14576568
16.3%

gam_score
Real number (ℝ)

Missing  Skewed 

Distinct 3271135
Distinct (%) 99.2%
Missing 3704631
Missing (%) 52.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.71900334
Minimum -1583.2964
Maximum 1100.9942
Zeros 0
Zeros (%) 0.0%
Negative 998817
Negative (%) 14.3%
Memory size 53.4 MiB
2025-04-28T21:08:47.734899 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -1583.2964
5-th percentile -0.46506803
Q1 -2.2496754 × 10-6
median 7.2532587 × 10-5
Q3 0.90596541
95-th percentile 4.0027948
Maximum 1100.9942
Range 2684.2906
Interquartile range (IQR) 0.90596766

Descriptive statistics

Standard deviation 2.8565971
Coefficient of variation (CV) 3.9729956
Kurtosis 57109.645
Mean 0.71900334
Median Absolute Deviation (MAD) 0.11186695
Skewness -68.479068
Sum 2369733.6
Variance 8.1601469
Monotonicity Not monotonic
2025-04-28T21:08:47.939449 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
-2.479471767 6
 
< 0.1%
65.93800615 6
 
< 0.1%
54.05488581 6
 
< 0.1%
42.34407438 6
 
< 0.1%
30.87659325 6
 
< 0.1%
19.63303383 6
 
< 0.1%
8.542334779 6
 
< 0.1%
0.2151436175 5
 
< 0.1%
1.654897268 5
 
< 0.1%
3.151476826 5
 
< 0.1%
Other values (3271125) 3295802
47.1%
(Missing) 3704631
52.9%
Value Count Frequency (%)
-1583.296417 1
< 0.1%
-1126.330337 1
< 0.1%
-861.4814983 1
< 0.1%
-817.4685841 1
< 0.1%
-804.1519415 1
< 0.1%
-703.3395573 1
< 0.1%
-612.1097067 1
< 0.1%
-580.7899328 1
< 0.1%
-571.3190674 1
< 0.1%
-405.5090676 1
< 0.1%
Value Count Frequency (%)
1100.994166 1
< 0.1%
783.8090356 1
< 0.1%
726.8397677 1
< 0.1%
587.9975242 1
< 0.1%
517.7999509 1
< 0.1%
490.2477456 1
< 0.1%
324.3546238 1
< 0.1%
324.1973924 1
< 0.1%
312.4392221 1
< 0.1%
307.908627 1
< 0.1%

norm
Real number (ℝ)

Missing  Zeros 

Distinct 2285419
Distinct (%) 70.8%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.50720471
Minimum 0
Maximum 1
Zeros 460837
Zeros (%) 6.6%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:48.153094 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0.17607395
median 0.50328406
Q3 0.8293085
95-th percentile 1
Maximum 1
Range 1
Interquartile range (IQR) 0.65323455

Descriptive statistics

Standard deviation 0.33819667
Coefficient of variation (CV) 0.66678535
Kurtosis -1.287696
Mean 0.50720471
Median Absolute Deviation (MAD) 0.32664082
Skewness -0.03209883
Sum 1636170.9
Variance 0.11437699
Monotonicity Not monotonic
2025-04-28T21:08:48.340503 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 460837
 
6.6%
1 460837
 
6.6%
0.1656521642 31
 
< 0.1%
0.1656521642 23
 
< 0.1%
0.1610982506 21
 
< 0.1%
0.1610982506 18
 
< 0.1%
0.1656521642 17
 
< 0.1%
0.1656521642 16
 
< 0.1%
0.1656521642 13
 
< 0.1%
0.3317725603 13
 
< 0.1%
Other values (2285409) 2304033
32.9%
(Missing) 3774631
53.9%
Value Count Frequency (%)
0 460837
6.6%
4.617381605 × 10-6 1
 
< 0.1%
1.271763111 × 10-5 1
 
< 0.1%
3.97711657 × 10-5 1
 
< 0.1%
5.064489504 × 10-5 1
 
< 0.1%
5.287215323 × 10-5 1
 
< 0.1%
6.223397524 × 10-5 1
 
< 0.1%
6.562148706 × 10-5 1
 
< 0.1%
6.701805892 × 10-5 1
 
< 0.1%
7.308616937 × 10-5 1
 
< 0.1%
Value Count Frequency (%)
1 460837
6.6%
0.9999999367 1
 
< 0.1%
0.9999994625 1
 
< 0.1%
0.9999979339 1
 
< 0.1%
0.9999960046 1
 
< 0.1%
0.9999957444 1
 
< 0.1%
0.9999956462 1
 
< 0.1%
0.9999906414 1
 
< 0.1%
0.9999903979 1
 
< 0.1%
0.9999881375 1
 
< 0.1%

gam_score_se
Real number (ℝ)

Missing  Skewed 

Distinct 3201135
Distinct (%) 99.2%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 3.1151531
Minimum 2.8668088 × 10-5
Maximum 955748.96
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:48.577241 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 2.8668088 × 10-5
5-th percentile 0.0013082665
Q1 0.0045930821
median 0.31811281
Q3 0.90509856
95-th percentile 2.4022826
Maximum 955748.96
Range 955748.96
Interquartile range (IQR) 0.90050548

Descriptive statistics

Standard deviation 851.67572
Coefficient of variation (CV) 273.39771
Kurtosis 567079.58
Mean 3.1151531
Median Absolute Deviation (MAD) 0.31555365
Skewness 635.07191
Sum 10049045
Variance 725351.53
Monotonicity Not monotonic
2025-04-28T21:08:48.768047 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
25.49875765 6
 
< 0.1%
32.22233532 6
 
< 0.1%
27.308445 6
 
< 0.1%
23.77487554 6
 
< 0.1%
21.20572913 6
 
< 0.1%
19.90467543 6
 
< 0.1%
21.01185789 6
 
< 0.1%
5.798424708 5
 
< 0.1%
4.297302722 5
 
< 0.1%
3.351460234 5
 
< 0.1%
Other values (3201125) 3225802
46.1%
(Missing) 3774631
53.9%
Value Count Frequency (%)
2.866808817 × 10-5 1
< 0.1%
2.903729106 × 10-5 1
< 0.1%
2.999385412 × 10-5 1
< 0.1%
3.23813602 × 10-5 1
< 0.1%
3.265033259 × 10-5 1
< 0.1%
3.279838455 × 10-5 1
< 0.1%
3.348108544 × 10-5 1
< 0.1%
3.387884774 × 10-5 1
< 0.1%
3.511656996 × 10-5 1
< 0.1%
3.533356434 × 10-5 1
< 0.1%
Value Count Frequency (%)
955748.9571 1
< 0.1%
431620.9721 1
< 0.1%
427811.0042 1
< 0.1%
373427.1342 1
< 0.1%
301929.7694 1
< 0.1%
274107.4666 1
< 0.1%
257861.8838 1
< 0.1%
249115.2338 1
< 0.1%
248583.5542 1
< 0.1%
221625.9706 1
< 0.1%

gam_score_90mse
Real number (ℝ)

Missing  Skewed 

Distinct 3201135
Distinct (%) 99.2%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean -4.3948504
Minimum -1572410
Maximum 47.232281
Zeros 0
Zeros (%) 0.0%
Negative 2658174
Negative (%) 38.0%
Memory size 53.4 MiB
2025-04-28T21:08:48.993188 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -1572410
5-th percentile -2.6165057
Q1 -0.52355155
median -0.012332565
Q3 -0.0026387694
95-th percentile 1.5253107
Maximum 47.232281
Range 1572457.2
Interquartile range (IQR) 0.52091278

Descriptive statistics

Standard deviation 1401.2542
Coefficient of variation (CV) -318.84002
Kurtosis 566898.09
Mean -4.3948504
Median Absolute Deviation (MAD) 0.20930088
Skewness -634.91439
Sum -14177168
Variance 1963513.3
Monotonicity Not monotonic
2025-04-28T21:08:49.221749 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
-44.42492811 6
 
< 0.1%
12.93226455 6
 
< 0.1%
9.132493784 6
 
< 0.1%
3.234404111 6
 
< 0.1%
-4.006831171 6
 
< 0.1%
-13.11015725 6
 
< 0.1%
-26.02217145 6
 
< 0.1%
-9.323265028 5
 
< 0.1%
-5.41416571 5
 
< 0.1%
-2.361675259 5
 
< 0.1%
Other values (3201125) 3225802
46.1%
(Missing) 3774631
53.9%
Value Count Frequency (%)
-1572410.002 1
< 0.1%
-709704.0598 1
< 0.1%
-703870.299 1
< 0.1%
-613560.7959 1
< 0.1%
-497056.3389 1
< 0.1%
-450388.9825 1
< 0.1%
-424448.4375 1
< 0.1%
-409866.166 1
< 0.1%
-408933.5692 1
< 0.1%
-364845.6755 1
< 0.1%
Value Count Frequency (%)
47.23228124 1
< 0.1%
44.19872864 1
< 0.1%
42.4465221 1
< 0.1%
35.50573457 1
< 0.1%
34.45740278 1
< 0.1%
34.08359317 1
< 0.1%
33.28758486 1
< 0.1%
31.85564911 1
< 0.1%
31.5168847 1
< 0.1%
29.72313926 1
< 0.1%

gam_score_90pse
Real number (ℝ)

Missing  Skewed 

Distinct 3201135
Distinct (%) 99.2%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 5.8540032
Minimum -5.3409881
Maximum 1572004.1
Zeros 0
Zeros (%) 0.0%
Negative 70408
Negative (%) 1.0%
Memory size 53.4 MiB
2025-04-28T21:08:49.491789 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -5.3409881
5-th percentile 0.0016412704
Q1 0.0066339165
median 0.44092014
Q3 2.5637388
95-th percentile 7.5232519
Maximum 1572004.1
Range 1572009.4
Interquartile range (IQR) 2.5571049

Descriptive statistics

Standard deviation 1400.7648
Coefficient of variation (CV) 239.28323
Kurtosis 567252.78
Mean 5.8540032
Median Absolute Deviation (MAD) 0.43905083
Skewness 635.22388
Sum 18884189
Variance 1962142.1
Monotonicity Not monotonic
2025-04-28T21:08:49.709951 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
39.46598457 6
 
< 0.1%
118.9437478 6
 
< 0.1%
98.97727784 6
 
< 0.1%
81.45374464 6
 
< 0.1%
65.76001766 6
 
< 0.1%
52.3762249 6
 
< 0.1%
43.106841 6
 
< 0.1%
9.753552263 5
 
< 0.1%
8.723960247 5
 
< 0.1%
8.664628912 5
 
< 0.1%
Other values (3201125) 3225802
46.1%
(Missing) 3774631
53.9%
Value Count Frequency (%)
-5.34098812 1
< 0.1%
-4.881815974 1
< 0.1%
-4.879823252 1
< 0.1%
-4.54468279 1
< 0.1%
-4.46198962 1
< 0.1%
-4.425665167 1
< 0.1%
-4.409437007 1
< 0.1%
-4.300086318 1
< 0.1%
-4.155693895 1
< 0.1%
-4.053676269 1
< 0.1%
Value Count Frequency (%)
1572004.067 1
< 0.1%
710328.9383 1
< 0.1%
703627.9047 1
< 0.1%
615014.4754 1
< 0.1%
496292.6023 1
< 0.1%
451424.5824 1
< 0.1%
423917.16 1
< 0.1%
409722.9531 1
< 0.1%
408906.3241 1
< 0.1%
364303.7678 1
< 0.1%

D
Real number (ℝ)

Missing  Zeros 

Distinct 633
Distinct (%) < 0.1%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 329.8544
Minimum 0
Maximum 7849
Zeros 126892
Zeros (%) 1.8%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:49.909197 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 1
Q1 17
median 76
Q3 309
95-th percentile 1469
Maximum 7849
Range 7849
Interquartile range (IQR) 292

Descriptive statistics

Standard deviation 716.44723
Coefficient of variation (CV) 2.1720105
Kurtosis 30.457729
Mean 329.8544
Median Absolute Deviation (MAD) 72
Skewness 4.8510509
Sum 1.0640638 × 109
Variance 513296.63
Monotonicity Not monotonic
2025-04-28T21:08:50.117119 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 126892
 
1.8%
1 88764
 
1.3%
2 74613
 
1.1%
5 64044
 
0.9%
3 54037
 
0.8%
4 49001
 
0.7%
8 41707
 
0.6%
6 41397
 
0.6%
7 39833
 
0.6%
9 35431
 
0.5%
Other values (623) 2610140
37.3%
(Missing) 3774631
53.9%
Value Count Frequency (%)
0 126892
1.8%
1 88764
1.3%
2 74613
1.1%
3 54037
0.8%
4 49001
 
0.7%
5 64044
0.9%
6 41397
 
0.6%
7 39833
 
0.6%
8 41707
 
0.6%
9 35431
 
0.5%
Value Count Frequency (%)
7849 466
 
< 0.1%
6927 3461
< 0.1%
6538 2301
< 0.1%
5923 2815
< 0.1%
5659 3461
< 0.1%
5506 3840
0.1%
5068 3775
0.1%
5053 2346
< 0.1%
4975 2449
< 0.1%
4628 2815
< 0.1%

E
Real number (ℝ)

Missing  Zeros 

Distinct 669
Distinct (%) < 0.1%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 96.446233
Minimum 0
Maximum 7397
Zeros 538729
Zeros (%) 7.7%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:51.123995 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
median 11
Q3 56
95-th percentile 448
Maximum 7397
Range 7397
Interquartile range (IQR) 54

Descriptive statistics

Standard deviation 323.39967
Coefficient of variation (CV) 3.3531602
Kurtosis 123.79768
Mean 96.446233
Median Absolute Deviation (MAD) 11
Skewness 9.0930663
Sum 3.1112195 × 108
Variance 104587.34
Monotonicity Not monotonic
2025-04-28T21:08:51.301429 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 538729
 
7.7%
1 254758
 
3.6%
2 165395
 
2.4%
3 131393
 
1.9%
4 103933
 
1.5%
5 89320
 
1.3%
6 72786
 
1.0%
7 66875
 
1.0%
8 60556
 
0.9%
9 52737
 
0.8%
Other values (659) 1689377
24.1%
(Missing) 3774631
53.9%
Value Count Frequency (%)
0 538729
7.7%
1 254758
3.6%
2 165395
 
2.4%
3 131393
 
1.9%
4 103933
 
1.5%
5 89320
 
1.3%
6 72786
 
1.0%
7 66875
 
1.0%
8 60556
 
0.9%
9 52737
 
0.8%
Value Count Frequency (%)
7397 799
< 0.1%
4972 789
< 0.1%
4286 799
< 0.1%
4284 733
< 0.1%
4138 772
< 0.1%
4087 815
< 0.1%
4024 542
< 0.1%
3345 768
< 0.1%
3339 789
< 0.1%
3015 772
< 0.1%

DE
Real number (ℝ)

Missing  Skewed  Zeros 

Distinct 376
Distinct (%) < 0.1%
Missing 3774631
Missing (%) 53.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.72117969
Minimum 0
Maximum 3193
Zeros 2453256
Zeros (%) 35.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:51.474846 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 3
Maximum 3193
Range 3193
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 5.9236867
Coefficient of variation (CV) 8.2138845
Kurtosis 61924.677
Mean 0.72117969
Median Absolute Deviation (MAD) 0
Skewness 168.53271
Sum 2326424
Variance 35.090064
Monotonicity Not monotonic
2025-04-28T21:08:51.643476 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 2453256
35.0%
1 468144
 
6.7%
2 122064
 
1.7%
3 55385
 
0.8%
4 31452
 
0.4%
5 20519
 
0.3%
6 14116
 
0.2%
7 10203
 
0.1%
8 7690
 
0.1%
9 5964
 
0.1%
Other values (366) 37066
 
0.5%
(Missing) 3774631
53.9%
Value Count Frequency (%)
0 2453256
35.0%
1 468144
 
6.7%
2 122064
 
1.7%
3 55385
 
0.8%
4 31452
 
0.4%
5 20519
 
0.3%
6 14116
 
0.2%
7 10203
 
0.1%
8 7690
 
0.1%
9 5964
 
0.1%
Value Count Frequency (%)
3193 1
< 0.1%
2629 1
< 0.1%
2611 1
< 0.1%
1747 1
< 0.1%
1697 1
< 0.1%
1591 1
< 0.1%
1451 1
< 0.1%
1239 1
< 0.1%
1163 1
< 0.1%
1096 1
< 0.1%

ade_name
Text

Missing 

Distinct 494451
Distinct (%) 15.0%
Missing 3707027
Missing (%) 53.0%
Memory size 53.4 MiB
2025-04-28T21:08:52.033686 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 132
Median length 110
Mean length 43.833084
Min length 13

Characters and Unicode

Total characters 144362641
Distinct characters 72
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 20182 ?
Unique (%) 0.6%

Sample

1st row valsartan and sacubitril and Fatigue
2nd row valsartan and sacubitril and Fatigue
3rd row valsartan and sacubitril and Fatigue
4th row valsartan and sacubitril and Fatigue
5th row valsartan and sacubitril and Fatigue
Value Count Frequency (%)
and 3337302
 
19.3%
systemic 1072063
 
6.2%
oral 1044760
 
6.0%
parenteral 407190
 
2.4%
rectal 261355
 
1.5%
increased 146701
 
0.8%
topical 141484
 
0.8%
disorder 128679
 
0.7%
decreased 117093
 
0.7%
infection 116006
 
0.7%
Other values (6622) 10528585
60.9%
2025-04-28T21:08:52.553861 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
a 14637195
 
10.1%
14007755
 
9.7%
e 12198949
 
8.5%
i 10830653
 
7.5%
n 10824260
 
7.5%
r 8914900
 
6.2%
o 8885093
 
6.2%
t 8082761
 
5.6%
s 7300786
 
5.1%
l 7159959
 
5.0%
Other values (62) 41520330
28.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 144362641
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
a 14637195
 
10.1%
14007755
 
9.7%
e 12198949
 
8.5%
i 10830653
 
7.5%
n 10824260
 
7.5%
r 8914900
 
6.2%
o 8885093
 
6.2%
t 8082761
 
5.6%
s 7300786
 
5.1%
l 7159959
 
5.0%
Other values (62) 41520330
28.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 144362641
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
a 14637195
 
10.1%
14007755
 
9.7%
e 12198949
 
8.5%
i 10830653
 
7.5%
n 10824260
 
7.5%
r 8914900
 
6.2%
o 8885093
 
6.2%
t 8082761
 
5.6%
s 7300786
 
5.1%
l 7159959
 
5.0%
Other values (62) 41520330
28.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 144362641
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
a 14637195
 
10.1%
14007755
 
9.7%
e 12198949
 
8.5%
i 10830653
 
7.5%
n 10824260
 
7.5%
r 8914900
 
6.2%
o 8885093
 
6.2%
t 8082761
 
5.6%
s 7300786
 
5.1%
l 7159959
 
5.0%
Other values (62) 41520330
28.8%

category
Categorical

Missing 

Distinct 28
Distinct (%) < 0.1%
Missing 6378728
Missing (%) 91.1%
Memory size 53.4 MiB
hlt_atc5
48305 
hlt_atc4
46083 
hlgt_atc5
44660 
hlt_atc3
42357 
hlgt_atc4
 
39421
Other values (23)
400936 

Length

Max length 9
Median length 8
Mean length 7.6795189
Min length 2

Characters and Unicode

Total characters 4774833
Distinct characters 15
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row soc
2nd row soc
3rd row soc
4th row soc
5th row soc

Common Values

Value Count Frequency (%)
hlt_atc5 48305
 
0.7%
hlt_atc4 46083
 
0.7%
hlgt_atc5 44660
 
0.6%
hlt_atc3 42357
 
0.6%
hlgt_atc4 39421
 
0.6%
hlt_atc2 38824
 
0.6%
pt_atc4 34716
 
0.5%
pt_atc3 33661
 
0.5%
soc_atc5 33634
 
0.5%
hlgt_atc3 33007
 
0.5%
Other values (18) 227094
 
3.2%
(Missing) 6378728
91.1%

Length

2025-04-28T21:08:52.733524 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
hlt_atc5 48305
 
7.8%
hlt_atc4 46083
 
7.4%
hlgt_atc5 44660
 
7.2%
hlt_atc3 42357
 
6.8%
hlgt_atc4 39421
 
6.3%
hlt_atc2 38824
 
6.2%
pt_atc4 34716
 
5.6%
pt_atc3 33661
 
5.4%
soc_atc5 33634
 
5.4%
hlgt_atc3 33007
 
5.3%
Other values (18) 227094
36.5%

Most occurring characters

Value Count Frequency (%)
t 1117990
23.4%
c 670820
14.0%
a 588243
12.3%
_ 578805
12.1%
h 374415
 
7.8%
l 374415
 
7.8%
g 161416
 
3.4%
p 155332
 
3.3%
4 144860
 
3.0%
5 132031
 
2.8%
Other values (5) 476506
10.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 4774833
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
t 1117990
23.4%
c 670820
14.0%
a 588243
12.3%
_ 578805
12.1%
h 374415
 
7.8%
l 374415
 
7.8%
g 161416
 
3.4%
p 155332
 
3.3%
4 144860
 
3.0%
5 132031
 
2.8%
Other values (5) 476506
10.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 4774833
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
t 1117990
23.4%
c 670820
14.0%
a 588243
12.3%
_ 578805
12.1%
h 374415
 
7.8%
l 374415
 
7.8%
g 161416
 
3.4%
p 155332
 
3.3%
4 144860
 
3.0%
5 132031
 
2.8%
Other values (5) 476506
10.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 4774833
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
t 1117990
23.4%
c 670820
14.0%
a 588243
12.3%
_ 578805
12.1%
h 374415
 
7.8%
l 374415
 
7.8%
g 161416
 
3.4%
p 155332
 
3.3%
4 144860
 
3.0%
5 132031
 
2.8%
Other values (5) 476506
10.0%

atc_concept_name
Text

Missing 

Distinct 2455
Distinct (%) 0.4%
Missing 6310612
Missing (%) 90.1%
Memory size 53.4 MiB
2025-04-28T21:08:53.021442 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 92
Median length 62
Mean length 21.418575
Min length 3

Characters and Unicode

Total characters 14776204
Distinct characters 67
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 924 ?
Unique (%) 0.1%

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan
Value Count Frequency (%)
and 95141
 
5.7%
for 64180
 
3.9%
agents 63246
 
3.8%
other 48817
 
2.9%
use 44238
 
2.7%
nan 42957
 
2.6%
systemic 39690
 
2.4%
system 37181
 
2.2%
antineoplastic 28568
 
1.7%
drugs 23906
 
1.4%
Other values (1649) 1171844
70.6%
2025-04-28T21:08:53.510254 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
969890
 
6.6%
A 798268
 
5.4%
S 789448
 
5.3%
T 770825
 
5.2%
I 729554
 
4.9%
E 668576
 
4.5%
i 641288
 
4.3%
N 634924
 
4.3%
e 594585
 
4.0%
n 563703
 
3.8%
Other values (57) 7615143
51.5%

Most occurring categories

Value Count Frequency (%)
(unknown) 14776204
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
969890
 
6.6%
A 798268
 
5.4%
S 789448
 
5.3%
T 770825
 
5.2%
I 729554
 
4.9%
E 668576
 
4.5%
i 641288
 
4.3%
N 634924
 
4.3%
e 594585
 
4.0%
n 563703
 
3.8%
Other values (57) 7615143
51.5%

Most occurring scripts

Value Count Frequency (%)
(unknown) 14776204
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
969890
 
6.6%
A 798268
 
5.4%
S 789448
 
5.3%
T 770825
 
5.2%
I 729554
 
4.9%
E 668576
 
4.5%
i 641288
 
4.3%
N 634924
 
4.3%
e 594585
 
4.0%
n 563703
 
3.8%
Other values (57) 7615143
51.5%

Most occurring blocks

Value Count Frequency (%)
(unknown) 14776204
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
969890
 
6.6%
A 798268
 
5.4%
S 789448
 
5.3%
T 770825
 
5.2%
I 729554
 
4.9%
E 668576
 
4.5%
i 641288
 
4.3%
N 634924
 
4.3%
e 594585
 
4.0%
n 563703
 
3.8%
Other values (57) 7615143
51.5%

meddra_concept_name
Text

Missing 

Distinct 10415
Distinct (%) 1.5%
Missing 6311700
Missing (%) 90.2%
Memory size 53.4 MiB
2025-04-28T21:08:53.865641 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 92
Median length 66
Mean length 28.203089
Min length 3

Characters and Unicode

Total characters 19426006
Distinct characters 71
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 137 ?
Unique (%) < 0.1%

Sample

1st row Blood and lymphatic system disorders
2nd row Cardiac disorders
3rd row Congenital, familial and genetic disorders
4th row Ear and labyrinth disorders
5th row Endocrine disorders
Value Count Frequency (%)
and 207191
 
9.0%
disorders 187334
 
8.2%
nec 94150
 
4.1%
infections 37833
 
1.6%
conditions 30838
 
1.3%
system 28139
 
1.2%
vascular 26209
 
1.1%
congenital 25635
 
1.1%
tissue 23325
 
1.0%
excl 21037
 
0.9%
Other values (5977) 1611413
70.3%
2025-04-28T21:08:54.399758 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
i 1659702
 
8.5%
e 1636053
 
8.4%
1604314
 
8.3%
s 1572497
 
8.1%
a 1480431
 
7.6%
n 1355050
 
7.0%
r 1338282
 
6.9%
o 1281780
 
6.6%
t 1112618
 
5.7%
d 949757
 
4.9%
Other values (61) 5435522
28.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 19426006
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 1659702
 
8.5%
e 1636053
 
8.4%
1604314
 
8.3%
s 1572497
 
8.1%
a 1480431
 
7.6%
n 1355050
 
7.0%
r 1338282
 
6.9%
o 1281780
 
6.6%
t 1112618
 
5.7%
d 949757
 
4.9%
Other values (61) 5435522
28.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 19426006
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 1659702
 
8.5%
e 1636053
 
8.4%
1604314
 
8.3%
s 1572497
 
8.1%
a 1480431
 
7.6%
n 1355050
 
7.0%
r 1338282
 
6.9%
o 1281780
 
6.6%
t 1112618
 
5.7%
d 949757
 
4.9%
Other values (61) 5435522
28.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 19426006
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 1659702
 
8.5%
e 1636053
 
8.4%
1604314
 
8.3%
s 1572497
 
8.1%
a 1480431
 
7.6%
n 1355050
 
7.0%
r 1338282
 
6.9%
o 1281780
 
6.6%
t 1112618
 
5.7%
d 949757
 
4.9%
Other values (61) 5435522
28.0%

atc_concept_class_id
Categorical

Missing 

Distinct 5
Distinct (%) < 0.1%
Missing 6421685
Missing (%) 91.7%
Memory size 53.4 MiB
ATC4
142592 
ATC5
126599 
ATC3
123600 
ATC2
108366 
ATC1
77648 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 2315220
Distinct characters 8
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ATC1
2nd row ATC1
3rd row ATC1
4th row ATC1
5th row ATC1

Common Values

Value Count Frequency (%)
ATC4 142592
 
2.0%
ATC5 126599
 
1.8%
ATC3 123600
 
1.8%
ATC2 108366
 
1.5%
ATC1 77648
 
1.1%
(Missing) 6421685
91.7%

Length

2025-04-28T21:08:54.573853 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:54.701878 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
atc4 142592
24.6%
atc5 126599
21.9%
atc3 123600
21.4%
atc2 108366
18.7%
atc1 77648
13.4%

Most occurring characters

Value Count Frequency (%)
A 578805
25.0%
T 578805
25.0%
C 578805
25.0%
4 142592
 
6.2%
5 126599
 
5.5%
3 123600
 
5.3%
2 108366
 
4.7%
1 77648
 
3.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 2315220
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
A 578805
25.0%
T 578805
25.0%
C 578805
25.0%
4 142592
 
6.2%
5 126599
 
5.5%
3 123600
 
5.3%
2 108366
 
4.7%
1 77648
 
3.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 2315220
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
A 578805
25.0%
T 578805
25.0%
C 578805
25.0%
4 142592
 
6.2%
5 126599
 
5.5%
3 123600
 
5.3%
2 108366
 
4.7%
1 77648
 
3.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 2315220
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
A 578805
25.0%
T 578805
25.0%
C 578805
25.0%
4 142592
 
6.2%
5 126599
 
5.5%
3 123600
 
5.3%
2 108366
 
4.7%
1 77648
 
3.4%

meddra_concept_class_id
Categorical

Missing 

Distinct 9
Distinct (%) < 0.1%
Missing 6378728
Missing (%) 91.1%
Memory size 53.4 MiB
HLT
212999 
HLGT
161416 
PT
155332 
SOC
82577 
ATC5
 
5432
Other values (4)
 
4006

Length

Max length 4
Median length 3
Mean length 3.0249645
Min length 2

Characters and Unicode

Total characters 1880808
Distinct characters 14
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row SOC
2nd row SOC
3rd row SOC
4th row SOC
5th row SOC

Common Values

Value Count Frequency (%)
HLT 212999
 
3.0%
HLGT 161416
 
2.3%
PT 155332
 
2.2%
SOC 82577
 
1.2%
ATC5 5432
 
0.1%
ATC4 2268
 
< 0.1%
ATC3 1110
 
< 0.1%
ATC2 530
 
< 0.1%
ATC1 98
 
< 0.1%
(Missing) 6378728
91.1%

Length

2025-04-28T21:08:54.919560 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:55.110386 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
hlt 212999
34.3%
hlgt 161416
26.0%
pt 155332
25.0%
soc 82577
 
13.3%
atc5 5432
 
0.9%
atc4 2268
 
0.4%
atc3 1110
 
0.2%
atc2 530
 
0.1%
atc1 98
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
T 539185
28.7%
H 374415
19.9%
L 374415
19.9%
G 161416
 
8.6%
P 155332
 
8.3%
C 92015
 
4.9%
S 82577
 
4.4%
O 82577
 
4.4%
A 9438
 
0.5%
5 5432
 
0.3%
Other values (4) 4006
 
0.2%

Most occurring categories

Value Count Frequency (%)
(unknown) 1880808
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
T 539185
28.7%
H 374415
19.9%
L 374415
19.9%
G 161416
 
8.6%
P 155332
 
8.3%
C 92015
 
4.9%
S 82577
 
4.4%
O 82577
 
4.4%
A 9438
 
0.5%
5 5432
 
0.3%
Other values (4) 4006
 
0.2%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1880808
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
T 539185
28.7%
H 374415
19.9%
L 374415
19.9%
G 161416
 
8.6%
P 155332
 
8.3%
C 92015
 
4.9%
S 82577
 
4.4%
O 82577
 
4.4%
A 9438
 
0.5%
5 5432
 
0.3%
Other values (4) 4006
 
0.2%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1880808
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
T 539185
28.7%
H 374415
19.9%
L 374415
19.9%
G 161416
 
8.6%
P 155332
 
8.3%
C 92015
 
4.9%
S 82577
 
4.4%
O 82577
 
4.4%
A 9438
 
0.5%
5 5432
 
0.3%
Other values (4) 4006
 
0.2%

a
Real number (ℝ)

Missing  Skewed 

Distinct 376
Distinct (%) 0.1%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 2.1828208
Minimum 1
Maximum 1278
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:55.297723 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
median 1
Q3 2
95-th percentile 5
Maximum 1278
Range 1277
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 10.96342
Coefficient of variation (CV) 5.0225928
Kurtosis 3383.6455
Mean 2.1828208
Median Absolute Deviation (MAD) 0
Skewness 48.919349
Sum 1357195
Variance 120.19657
Monotonicity Not monotonic
2025-04-28T21:08:55.485399 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 465118
 
6.6%
2 79067
 
1.1%
3 27516
 
0.4%
4 13993
 
0.2%
5 7861
 
0.1%
6 5364
 
0.1%
7 3693
 
0.1%
8 2800
 
< 0.1%
9 2225
 
< 0.1%
10 1672
 
< 0.1%
Other values (366) 12453
 
0.2%
(Missing) 6378728
91.1%
Value Count Frequency (%)
1 465118
6.6%
2 79067
 
1.1%
3 27516
 
0.4%
4 13993
 
0.2%
5 7861
 
0.1%
6 5364
 
0.1%
7 3693
 
0.1%
8 2800
 
< 0.1%
9 2225
 
< 0.1%
10 1672
 
< 0.1%
Value Count Frequency (%)
1278 1
< 0.1%
1256 1
< 0.1%
1117 1
< 0.1%
1025 1
< 0.1%
1000 2
< 0.1%
996 1
< 0.1%
987 1
< 0.1%
975 1
< 0.1%
961 1
< 0.1%
924 1
< 0.1%

b
Real number (ℝ)

Missing 

Distinct 988
Distinct (%) 0.2%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 5858.8967
Minimum 3503
Maximum 6792
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:55.651462 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 3503
5-th percentile 4283
Q1 5373
median 6267
Q3 6521
95-th percentile 6792
Maximum 6792
Range 3289
Interquartile range (IQR) 1148

Descriptive statistics

Standard deviation 856.28301
Coefficient of variation (CV) 0.1461509
Kurtosis -1.0372225
Mean 5858.8967
Median Absolute Deviation (MAD) 524
Skewness -0.67791726
Sum 3.6428394 × 109
Variance 733220.59
Monotonicity Not monotonic
2025-04-28T21:08:55.819493 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
6521 73479
 
1.0%
6792 73103
 
1.0%
6267 68719
 
1.0%
6332 67381
 
1.0%
5376 65952
 
0.9%
4784 60853
 
0.9%
4283 55634
 
0.8%
6791 13178
 
0.2%
6266 12787
 
0.2%
6520 12555
 
0.2%
Other values (978) 118121
 
1.7%
(Missing) 6378728
91.1%
Value Count Frequency (%)
3503 1
< 0.1%
3654 1
< 0.1%
3714 1
< 0.1%
3768 1
< 0.1%
3783 1
< 0.1%
3795 1
< 0.1%
3808 1
< 0.1%
3815 1
< 0.1%
3843 1
< 0.1%
3844 1
< 0.1%
Value Count Frequency (%)
6792 73103
1.0%
6791 13178
 
0.2%
6790 4563
 
0.1%
6789 2431
 
< 0.1%
6788 1356
 
< 0.1%
6787 918
 
< 0.1%
6786 649
 
< 0.1%
6785 496
 
< 0.1%
6784 366
 
< 0.1%
6783 304
 
< 0.1%

c
Real number (ℝ)

Missing  Skewed  Zeros 

Distinct 607
Distinct (%) 0.1%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 2.9494839
Minimum 0
Maximum 2433
Zeros 383839
Zeros (%) 5.5%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:55.994197 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 10
Maximum 2433
Range 2433
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 25.291387
Coefficient of variation (CV) 8.5748517
Kurtosis 2944.789
Mean 2.9494839
Median Absolute Deviation (MAD) 0
Skewness 46.180789
Sum 1833877
Variance 639.65425
Monotonicity Not monotonic
2025-04-28T21:08:56.185075 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0 383839
 
5.5%
1 86677
 
1.2%
2 42626
 
0.6%
3 24844
 
0.4%
4 16263
 
0.2%
5 11178
 
0.2%
6 8166
 
0.1%
7 6158
 
0.1%
8 4874
 
0.1%
9 3983
 
0.1%
Other values (597) 33154
 
0.5%
(Missing) 6378728
91.1%
Value Count Frequency (%)
0 383839
5.5%
1 86677
 
1.2%
2 42626
 
0.6%
3 24844
 
0.4%
4 16263
 
0.2%
5 11178
 
0.2%
6 8166
 
0.1%
7 6158
 
0.1%
8 4874
 
0.1%
9 3983
 
0.1%
Value Count Frequency (%)
2433 1
< 0.1%
2400 1
< 0.1%
2365 1
< 0.1%
2329 1
< 0.1%
2324 1
< 0.1%
2256 1
< 0.1%
2229 1
< 0.1%
2214 1
< 0.1%
2180 1
< 0.1%
2138 1
< 0.1%

d
Real number (ℝ)

Missing 

Distinct 2283
Distinct (%) 0.4%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 12110.258
Minimum 10675
Maximum 15154
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:56.547191 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 10675
5-th percentile 11147
Q1 11340
median 11662
Q3 13017
95-th percentile 13472
Maximum 15154
Range 4479
Interquartile range (IQR) 1677

Descriptive statistics

Standard deviation 905.0035
Coefficient of variation (CV) 0.074730322
Kurtosis 0.0020492053
Mean 12110.258
Median Absolute Deviation (MAD) 514
Skewness 0.8714147
Sum 7.5296984 × 109
Variance 819031.34
Monotonicity Not monotonic
2025-04-28T21:08:56.713597 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
11148 60824
 
0.9%
11340 60011
 
0.9%
11662 58852
 
0.8%
11577 55940
 
0.8%
12433 50641
 
0.7%
13019 46196
 
0.7%
13472 42456
 
0.6%
11147 13781
 
0.2%
11339 13318
 
0.2%
11576 12622
 
0.2%
Other values (2273) 207121
 
3.0%
(Missing) 6378728
91.1%
Value Count Frequency (%)
10675 1
< 0.1%
10687 1
< 0.1%
10770 1
< 0.1%
10843 1
< 0.1%
10900 1
< 0.1%
10901 1
< 0.1%
10902 1
< 0.1%
10913 1
< 0.1%
10917 1
< 0.1%
10925 2
< 0.1%
Value Count Frequency (%)
15154 154
 
< 0.1%
15153 1395
< 0.1%
15152 714
< 0.1%
15151 525
 
< 0.1%
15150 334
 
< 0.1%
15149 240
 
< 0.1%
15148 194
 
< 0.1%
15147 139
 
< 0.1%
15146 131
 
< 0.1%
15145 122
 
< 0.1%

lwr
Real number (ℝ)

Missing 

Distinct 15006
Distinct (%) 2.4%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 0.13981449
Minimum 0.00134153
Maximum 21.657279
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:56.886405 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0.00134153
5-th percentile 0.013911187
Q1 0.042079303
median 0.047706522
Q3 0.080633861
95-th percentile 0.59066964
Maximum 21.657279
Range 21.655937
Interquartile range (IQR) 0.038554557

Descriptive statistics

Standard deviation 0.27049641
Coefficient of variation (CV) 1.9346809
Kurtosis 234.72954
Mean 0.13981449
Median Absolute Deviation (MAD) 0.02205731
Skewness 8.3332671
Sum 86931.334
Variance 0.073168307
Monotonicity Not monotonic
2025-04-28T21:08:57.058702 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0.04207930329 53570
 
0.8%
0.04458263212 53380
 
0.8%
0.04770652161 51138
 
0.7%
0.04687272307 48970
 
0.7%
0.05928846718 45906
 
0.7%
0.06976383139 42109
 
0.6%
0.08063386062 38623
 
0.6%
0.02090460533 9949
 
0.1%
0.02214827468 9732
 
0.1%
0.02328601866 9260
 
0.1%
Other values (14996) 259125
 
3.7%
(Missing) 6378728
91.1%
Value Count Frequency (%)
0.001341529984 2
< 0.1%
0.001377042144 1
< 0.1%
0.001386303225 1
< 0.1%
0.001416783902 1
< 0.1%
0.001421115887 1
< 0.1%
0.001434031858 1
< 0.1%
0.001520387968 1
< 0.1%
0.001530173917 1
< 0.1%
0.001540246981 1
< 0.1%
0.001567899446 1
< 0.1%
Value Count Frequency (%)
21.65727862 2
< 0.1%
13.87425078 2
< 0.1%
11.31111151 1
 
< 0.1%
10.72333579 1
 
< 0.1%
9.782325697 4
< 0.1%
9.057418938 1
 
< 0.1%
8.309549972 1
 
< 0.1%
7.689418325 1
 
< 0.1%
7.675235773 1
 
< 0.1%
7.57435195 3
< 0.1%

odds_ratio
Real number (ℝ)

Missing 

Distinct 14807
Distinct (%) 6.2%
Missing 6762567
Missing (%) 96.6%
Infinite 0
Infinite (%) 0.0%
Mean 1.9091598
Minimum 0.054568098
Maximum 67.490004
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:57.226914 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0.054568098
5-th percentile 0.36552637
Q1 0.86805772
median 1.6410821
Q3 2.3124051
95-th percentile 5.2176204
Maximum 67.490004
Range 67.435436
Interquartile range (IQR) 1.4443474

Descriptive statistics

Standard deviation 1.7351204
Coefficient of variation (CV) 0.90883979
Kurtosis 39.608329
Mean 1.9091598
Median Absolute Deviation (MAD) 0.77174846
Skewness 3.9942356
Sum 454233.03
Variance 3.0106429
Monotonicity Not monotonic
2025-04-28T21:08:57.405691 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1.64108209 9949
 
0.1%
1.738701245 9732
 
0.1%
1.828006996 9260
 
0.1%
2.312405113 8781
 
0.1%
1.860701112 8536
 
0.1%
2.720975481 7659
 
0.1%
3.14502416 6616
 
0.1%
1.156131878 3757
 
0.1%
0.8693336331 3677
 
0.1%
0.8205592596 3528
 
0.1%
Other values (14797) 166428
 
2.4%
(Missing) 6762567
96.6%
Value Count Frequency (%)
0.05456809791 2
< 0.1%
0.05533288192 1
< 0.1%
0.05623415523 1
< 0.1%
0.05733603857 1
< 0.1%
0.05781733035 1
< 0.1%
0.05847617725 1
< 0.1%
0.06164440307 1
< 0.1%
0.0617468172 1
< 0.1%
0.06298545453 1
< 0.1%
0.06374354861 1
< 0.1%
Value Count Frequency (%)
67.49000404 1
 
< 0.1%
42.55681534 1
 
< 0.1%
42.14805892 1
 
< 0.1%
39.32585759 1
 
< 0.1%
39.00186398 1
 
< 0.1%
38.3726715 4
< 0.1%
34.06995123 1
 
< 0.1%
33.75510211 1
 
< 0.1%
32.72349883 1
 
< 0.1%
31.51351638 2
< 0.1%

upr
Real number (ℝ)

Missing 

Distinct 14807
Distinct (%) 6.2%
Missing 6762567
Missing (%) 96.6%
Infinite 0
Infinite (%) 0.0%
Mean 80.693309
Minimum 0.25019618
Maximum 2762.9908
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:57.579273 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0.25019618
5-th percentile 1.5377134
Q1 5.1828331
median 17.880295
Q3 143.35945
95-th percentile 272.46588
Maximum 2762.9908
Range 2762.7406
Interquartile range (IQR) 138.17662

Descriptive statistics

Standard deviation 105.87483
Coefficient of variation (CV) 1.3120645
Kurtosis 9.0860545
Mean 80.693309
Median Absolute Deviation (MAD) 15.583414
Skewness 1.9677921
Sum 19198794
Variance 11209.479
Monotonicity Not monotonic
2025-04-28T21:08:57.746607 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
128.7133343 9949
 
0.1%
136.3626593 9732
 
0.1%
143.3594517 9260
 
0.1%
181.279412 8781
 
0.1%
145.9067123 8536
 
0.1%
213.2548254 7659
 
0.1%
246.4172615 6616
 
0.1%
22.21640539 3757
 
0.1%
16.70275849 3677
 
0.1%
15.76484814 3528
 
0.1%
Other values (14797) 166428
 
2.4%
(Missing) 6762567
96.6%
Value Count Frequency (%)
0.2501961769 1
< 0.1%
0.2751686898 1
< 0.1%
0.2920582145 1
< 0.1%
0.3075935421 1
< 0.1%
0.3083758884 1
< 0.1%
0.311709364 1
< 0.1%
0.3151973965 1
< 0.1%
0.3178243271 1
< 0.1%
0.3213858444 1
< 0.1%
0.321828071 1
< 0.1%
Value Count Frequency (%)
2762.9908 1
 
< 0.1%
1802.461975 1
 
< 0.1%
1732.307922 1
 
< 0.1%
1664.52855 1
 
< 0.1%
1642.197459 1
 
< 0.1%
1571.938528 4
< 0.1%
1442.508241 1
 
< 0.1%
1437.77005 1
 
< 0.1%
1389.395936 1
 
< 0.1%
1358.36701 2
< 0.1%

pvalue
Real number (ℝ)

Missing 

Distinct 13942
Distinct (%) 2.2%
Missing 6378728
Missing (%) 91.1%
Infinite 0
Infinite (%) 0.0%
Mean 0.43273837
Minimum 5.8866777 × 10-87
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:57.926479 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 5.8866777 × 10-87
5-th percentile 0.072220807
Q1 0.26875983
median 0.35360134
Q3 0.46529885
95-th percentile 1
Maximum 1
Range 1
Interquartile range (IQR) 0.19653902

Descriptive statistics

Standard deviation 0.28001915
Coefficient of variation (CV) 0.64708649
Kurtosis 0.078877221
Mean 0.43273837
Median Absolute Deviation (MAD) 0.084841511
Skewness 1.0632428
Sum 269060.27
Variance 0.078410725
Monotonicity Not monotonic
2025-04-28T21:08:58.105781 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 98602
 
1.4%
0.3786299537 53570
 
0.8%
0.3651326839 53380
 
0.8%
0.3495817066 51138
 
0.7%
0.35360134 48970
 
0.7%
0.3019090399 45906
 
0.7%
0.2687598293 42109
 
0.6%
0.2412705564 38623
 
0.6%
0.5126808458 8781
 
0.1%
0.4652988517 7659
 
0.1%
Other values (13932) 173024
 
2.5%
(Missing) 6378728
91.1%
Value Count Frequency (%)
5.886677744 × 10-87 1
< 0.1%
3.108247758 × 10-51 1
< 0.1%
9.940723947 × 10-40 1
< 0.1%
8.709728709 × 10-29 1
< 0.1%
4.53904082 × 10-26 1
< 0.1%
3.485339269 × 10-25 1
< 0.1%
2.514910737 × 10-22 1
< 0.1%
3.691974821 × 10-21 1
< 0.1%
1.039725295 × 10-20 1
< 0.1%
6.220832436 × 10-20 1
< 0.1%
Value Count Frequency (%)
1 98602
1.4%
0.9673017561 1
 
< 0.1%
0.9609244748 1
 
< 0.1%
0.957249295 1
 
< 0.1%
0.9520740118 1
 
< 0.1%
0.9489849933 1
 
< 0.1%
0.9477859839 1
 
< 0.1%
0.9466495942 1
 
< 0.1%
0.9453009213 1
 
< 0.1%
0.9448824863 1
 
< 0.1%

fdr
Real number (ℝ)

Missing 

Distinct 5179
Distinct (%) 0.6%
Missing 6184674
Missing (%) 88.3%
Infinite 0
Infinite (%) 0.0%
Mean 0.60546731
Minimum 6.373506 × 10-83
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:08:58.286029 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 6.373506 × 10-83
5-th percentile 7.9748308 × 10-10
Q1 0.69389984
median 0.69389984
Q3 0.69389984
95-th percentile 1
Maximum 1
Range 1
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 0.32183137
Coefficient of variation (CV) 0.53154212
Kurtosis -0.29720366
Mean 0.60546731
Median Absolute Deviation (MAD) 0
Skewness -0.95189962
Sum 493949.92
Variance 0.10357543
Monotonicity Not monotonic
2025-04-28T21:08:58.462793 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0.6938998431 434281
 
6.2%
1 110645
 
1.6%
0.8170273935 10111
 
0.1%
0.7838479333 8608
 
0.1%
0.751372918 7286
 
0.1%
0.8719330968 5806
 
0.1%
0.9755106201 2461
 
< 0.1%
0.9297289142 2204
 
< 0.1%
5.240879432 × 10-8 2167
 
< 0.1%
0.8829791563 2130
 
< 0.1%
Other values (5169) 230117
 
3.3%
(Missing) 6184674
88.3%
Value Count Frequency (%)
6.373505994 × 10-83 1
 
< 0.1%
2.234559099 × 10-47 1
 
< 0.1%
5.689899432 × 10-36 1
 
< 0.1%
3.143341091 × 10-25 1
 
< 0.1%
1.411407675 × 10-22 1
 
< 0.1%
1.040554407 × 10-21 1
 
< 0.1%
6.384186158 × 10-19 1
 
< 0.1%
8.038984721 × 10-19 197
< 0.1%
1.691303207 × 10-18 197
< 0.1%
2.64633239 × 10-18 197
< 0.1%
Value Count Frequency (%)
1 110645
1.6%
0.9999064712 12
 
< 0.1%
0.9998377323 1
 
< 0.1%
0.999628494 1
 
< 0.1%
0.9987463441 12
 
< 0.1%
0.9986199302 2
 
< 0.1%
0.9984328232 1
 
< 0.1%
0.9979517332 1
 
< 0.1%
0.9973352938 1
 
< 0.1%
0.9973306005 1
 
< 0.1%

null_99
Categorical

Missing  Uniform 

Distinct 7
Distinct (%) 100.0%
Missing 7000483
Missing (%) > 99.9%
Memory size 53.4 MiB
2.6549017076639285
2.342474084282697
2.384844260845914
2.6989566909507974
3.2279419146896537
Other values (2)

Length

Max length 18
Median length 17
Mean length 17.428571
Min length 17

Characters and Unicode

Total characters 122
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 7 ?
Unique (%) 100.0%

Sample

1st row 2.6549017076639285
2nd row 2.342474084282697
3rd row 2.384844260845914
4th row 2.6989566909507974
5th row 3.2279419146896537

Common Values

Value Count Frequency (%)
2.6549017076639285 1
 
< 0.1%
2.342474084282697 1
 
< 0.1%
2.384844260845914 1
 
< 0.1%
2.6989566909507974 1
 
< 0.1%
3.2279419146896537 1
 
< 0.1%
3.807943590739353 1
 
< 0.1%
4.397453705122039 1
 
< 0.1%
(Missing) 7000483
> 99.9%

Length

2025-04-28T21:08:58.616999 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:08:58.759752 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
2.6549017076639285 1
14.3%
2.342474084282697 1
14.3%
2.384844260845914 1
14.3%
2.6989566909507974 1
14.3%
3.2279419146896537 1
14.3%
3.807943590739353 1
14.3%
4.397453705122039 1
14.3%

Most occurring characters

Value Count Frequency (%)
9 17
13.9%
4 16
13.1%
2 13
10.7%
3 13
10.7%
7 12
9.8%
0 10
8.2%
6 10
8.2%
5 10
8.2%
8 9
7.4%
. 7
5.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 122
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
9 17
13.9%
4 16
13.1%
2 13
10.7%
3 13
10.7%
7 12
9.8%
0 10
8.2%
6 10
8.2%
5 10
8.2%
8 9
7.4%
. 7
5.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 122
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
9 17
13.9%
4 16
13.1%
2 13
10.7%
3 13
10.7%
7 12
9.8%
0 10
8.2%
6 10
8.2%
5 10
8.2%
8 9
7.4%
. 7
5.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 122
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
9 17
13.9%
4 16
13.1%
2 13
10.7%
3 13
10.7%
7 12
9.8%
0 10
8.2%
6 10
8.2%
5 10
8.2%
8 9
7.4%
. 7
5.7%

safetyreportid
Text

Missing 

Distinct 264444
Distinct (%) 11.4%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
2025-04-28T21:08:59.269917 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 9
Median length 8
Mean length 8.1550819
Min length 7

Characters and Unicode

Total characters 18971844
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 44773 ?
Unique (%) 1.9%

Sample

1st row 10003357
2nd row 10003357
3rd row 10003357
4th row 10003357
5th row 10003357
Value Count Frequency (%)
15997294 3696
 
0.2%
15759495 2112
 
0.1%
15333886 2112
 
0.1%
16008869 2058
 
0.1%
15614390 1920
 
0.1%
8439111-2 1700
 
0.1%
14539518 1518
 
0.1%
5528013-7 1271
 
0.1%
10487906 1260
 
0.1%
15954362 1222
 
0.1%
Other values (264434) 2307514
99.2%
2025-04-28T21:08:59.887095 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

sex
Categorical

Missing 

Distinct 2
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
Female
1235999 
Male
1090384 

Length

Max length 6
Median length 6
Mean length 5.0625929
Min length 4

Characters and Unicode

Total characters 11777530
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Male
2nd row Male
3rd row Male
4th row Male
5th row Male

Common Values

Value Count Frequency (%)
Female 1235999
 
17.7%
Male 1090384
 
15.6%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:00.081460 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:00.210541 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
female 1235999
53.1%
male 1090384
46.9%

Most occurring characters

Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

reporter_qualification
Categorical

Missing 

Distinct 5
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
Physician
782542 
Other health professional
716161 
Consumer or non-health professional
685768 
Pharmacist
104806 
Lawyer
 
37106

Length

Max length 35
Median length 25
Mean length 21.586935
Min length 6

Characters and Unicode

Total characters 50219479
Distinct characters 23
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Other health professional
2nd row Other health professional
3rd row Other health professional
4th row Other health professional
5th row Other health professional

Common Values

Value Count Frequency (%)
Physician 782542
 
11.2%
Other health professional 716161
 
10.2%
Consumer or non-health professional 685768
 
9.8%
Pharmacist 104806
 
1.5%
Lawyer 37106
 
0.5%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:00.340277 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:00.469777 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
professional 1401929
24.1%
physician 782542
13.5%
other 716161
12.3%
health 716161
12.3%
consumer 685768
11.8%
or 685768
11.8%
non-health 685768
11.8%
pharmacist 104806
 
1.8%
lawyer 37106
 
0.6%

Most occurring characters

Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

receive_date
Date

Missing 

Distinct 4770
Distinct (%) 0.2%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
Minimum 1994-02-02 00:00:00
Maximum 2019-03-31 00:00:00
Invalid dates 0
Invalid dates (%) 0.0%
2025-04-28T21:09:00.630441 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T21:09:00.806535 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

XA
Real number (ℝ)

Missing  Zeros 

Distinct 12
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 0.65569642
Minimum 0
Maximum 11
Zeros 1621810
Zeros (%) 23.2%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:00.960783 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 3
Maximum 11
Range 11
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 1.3875397
Coefficient of variation (CV) 2.1161313
Kurtosis 13.272512
Mean 0.65569642
Median Absolute Deviation (MAD) 0
Skewness 3.2488564
Sum 1525401
Variance 1.9252664
Monotonicity Not monotonic
2025-04-28T21:09:01.282225 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
Value Count Frequency (%)
0 1621810
 
23.2%
1 353477
 
5.0%
2 162982
 
2.3%
3 76611
 
1.1%
4 47470
 
0.7%
5 21878
 
0.3%
6 14480
 
0.2%
7 11448
 
0.2%
10 6596
 
0.1%
9 5156
 
0.1%
Other values (2) 4475
 
0.1%
(Missing) 4674107
66.8%
Value Count Frequency (%)
0 1621810
23.2%
1 353477
 
5.0%
2 162982
 
2.3%
3 76611
 
1.1%
4 47470
 
0.7%
5 21878
 
0.3%
6 14480
 
0.2%
7 11448
 
0.2%
8 3916
 
0.1%
9 5156
 
0.1%
Value Count Frequency (%)
11 559
 
< 0.1%
10 6596
 
0.1%
9 5156
 
0.1%
8 3916
 
0.1%
7 11448
 
0.2%
6 14480
 
0.2%
5 21878
 
0.3%
4 47470
 
0.7%
3 76611
1.1%
2 162982
2.3%

XB
Categorical

Imbalance  Missing 

Distinct 6
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
2057225 
1.0
212455 
2.0
 
44962
4.0
 
7694
3.0
 
4012

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2057225
29.4%
1.0 212455
 
3.0%
2.0 44962
 
0.6%
4.0 7694
 
0.1%
3.0 4012
 
0.1%
5.0 35
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:01.424700 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:01.546310 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2057225
88.4%
1.0 212455
 
9.1%
2.0 44962
 
1.9%
4.0 7694
 
0.3%
3.0 4012
 
0.2%
5.0 35
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

XC
Real number (ℝ)

Missing  Zeros 

Distinct 11
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 0.31425522
Minimum 0
Maximum 10
Zeros 1931835
Zeros (%) 27.6%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:01.674862 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 2
Maximum 10
Range 10
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 0.88681187
Coefficient of variation (CV) 2.821948
Kurtosis 24.397496
Mean 0.31425522
Median Absolute Deviation (MAD) 0
Skewness 4.2289174
Sum 731078
Variance 0.7864353
Monotonicity Not monotonic
2025-04-28T21:09:01.804931 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
Value Count Frequency (%)
0 1931835
27.6%
1 221139
 
3.2%
2 92689
 
1.3%
3 37441
 
0.5%
4 25176
 
0.4%
5 7811
 
0.1%
6 5485
 
0.1%
7 2455
 
< 0.1%
10 1700
 
< 0.1%
8 484
 
< 0.1%
(Missing) 4674107
66.8%
Value Count Frequency (%)
0 1931835
27.6%
1 221139
 
3.2%
2 92689
 
1.3%
3 37441
 
0.5%
4 25176
 
0.4%
5 7811
 
0.1%
6 5485
 
0.1%
7 2455
 
< 0.1%
8 484
 
< 0.1%
9 168
 
< 0.1%
Value Count Frequency (%)
10 1700
 
< 0.1%
9 168
 
< 0.1%
8 484
 
< 0.1%
7 2455
 
< 0.1%
6 5485
 
0.1%
5 7811
 
0.1%
4 25176
 
0.4%
3 37441
 
0.5%
2 92689
1.3%
1 221139
3.2%

XD
Categorical

Imbalance  Missing 

Distinct 7
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
1851235 
1.0
410125 
2.0
 
48958
3.0
 
12473
4.0
 
2176
Other values (2)
 
1416

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 8
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1851235
 
26.4%
1.0 410125
 
5.9%
2.0 48958
 
0.7%
3.0 12473
 
0.2%
4.0 2176
 
< 0.1%
5.0 1251
 
< 0.1%
6.0 165
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:01.943279 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:02.071754 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1851235
79.6%
1.0 410125
 
17.6%
2.0 48958
 
2.1%
3.0 12473
 
0.5%
4.0 2176
 
0.1%
5.0 1251
 
0.1%
6.0 165
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

XG
Categorical

Imbalance  Missing 

Distinct 5
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
2093389 
1.0
215437 
2.0
 
16416
3.0
 
641
5.0
 
500

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2093389
29.9%
1.0 215437
 
3.1%
2.0 16416
 
0.2%
3.0 641
 
< 0.1%
5.0 500
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:02.214225 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:02.335610 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2093389
90.0%
1.0 215437
 
9.3%
2.0 16416
 
0.7%
3.0 641
 
< 0.1%
5.0 500
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

XH
Categorical

Imbalance  Missing 

Distinct 5
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
1708771 
1.0
459754 
2.0
 
124040
3.0
 
27853
4.0
 
5965

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Common Values

Value Count Frequency (%)
0.0 1708771
 
24.4%
1.0 459754
 
6.6%
2.0 124040
 
1.8%
3.0 27853
 
0.4%
4.0 5965
 
0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:02.466801 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:02.588131 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1708771
73.5%
1.0 459754
 
19.8%
2.0 124040
 
5.3%
3.0 27853
 
1.2%
4.0 5965
 
0.3%

Most occurring characters

Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

XJ
Real number (ℝ)

Missing  Zeros 

Distinct 14
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 0.4788846
Minimum 0
Maximum 13
Zeros 1775841
Zeros (%) 25.4%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:02.711044 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 3
Maximum 13
Range 13
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 1.1490155
Coefficient of variation (CV) 2.3993577
Kurtosis 17.782648
Mean 0.4788846
Median Absolute Deviation (MAD) 0
Skewness 3.6609975
Sum 1114069
Variance 1.3202365
Monotonicity Not monotonic
2025-04-28T21:09:02.851945 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
Value Count Frequency (%)
0 1775841
 
25.4%
1 290231
 
4.1%
2 125571
 
1.8%
3 62237
 
0.9%
4 31745
 
0.5%
5 14973
 
0.2%
7 10493
 
0.1%
6 10021
 
0.1%
8 1678
 
< 0.1%
11 1448
 
< 0.1%
Other values (4) 2145
 
< 0.1%
(Missing) 4674107
66.8%
Value Count Frequency (%)
0 1775841
25.4%
1 290231
 
4.1%
2 125571
 
1.8%
3 62237
 
0.9%
4 31745
 
0.5%
5 14973
 
0.2%
6 10021
 
0.1%
7 10493
 
0.1%
8 1678
 
< 0.1%
9 1001
 
< 0.1%
Value Count Frequency (%)
13 78
 
< 0.1%
12 264
 
< 0.1%
11 1448
 
< 0.1%
10 802
 
< 0.1%
9 1001
 
< 0.1%
8 1678
 
< 0.1%
7 10493
 
0.1%
6 10021
 
0.1%
5 14973
0.2%
4 31745
0.5%

XL
Real number (ℝ)

Missing  Zeros 

Distinct 13
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 0.94838726
Minimum 0
Maximum 12
Zeros 1426884
Zeros (%) 20.4%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:02.985180 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 5
Maximum 12
Range 12
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 1.6191105
Coefficient of variation (CV) 1.707225
Kurtosis 4.8838488
Mean 0.94838726
Median Absolute Deviation (MAD) 0
Skewness 2.1625039
Sum 2206312
Variance 2.6215187
Monotonicity Not monotonic
2025-04-28T21:09:03.128803 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
Value Count Frequency (%)
0 1426884
 
20.4%
1 362830
 
5.2%
2 214569
 
3.1%
3 122483
 
1.7%
4 79939
 
1.1%
5 51489
 
0.7%
7 27435
 
0.4%
6 27415
 
0.4%
8 9447
 
0.1%
9 2107
 
< 0.1%
Other values (3) 1785
 
< 0.1%
(Missing) 4674107
66.8%
Value Count Frequency (%)
0 1426884
20.4%
1 362830
 
5.2%
2 214569
 
3.1%
3 122483
 
1.7%
4 79939
 
1.1%
5 51489
 
0.7%
6 27415
 
0.4%
7 27435
 
0.4%
8 9447
 
0.1%
9 2107
 
< 0.1%
Value Count Frequency (%)
12 95
 
< 0.1%
11 580
 
< 0.1%
10 1110
 
< 0.1%
9 2107
 
< 0.1%
8 9447
 
0.1%
7 27435
 
0.4%
6 27415
 
0.4%
5 51489
0.7%
4 79939
1.1%
3 122483
1.7%

XM
Categorical

Imbalance  Missing 

Distinct 6
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
1984186 
1.0
289738 
2.0
 
41954
3.0
 
9321
4.0
 
750

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1984186
28.3%
1.0 289738
 
4.1%
2.0 41954
 
0.6%
3.0 9321
 
0.1%
4.0 750
 
< 0.1%
5.0 434
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:03.275439 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:03.400202 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1984186
85.3%
1.0 289738
 
12.5%
2.0 41954
 
1.8%
3.0 9321
 
0.4%
4.0 750
 
< 0.1%
5.0 434
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

XN
Real number (ℝ)

Missing  Zeros 

Distinct 19
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 1.183239
Minimum 0
Maximum 18
Zeros 1263277
Zeros (%) 18.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:03.528705 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 2
95-th percentile 5
Maximum 18
Range 18
Interquartile range (IQR) 2

Descriptive statistics

Standard deviation 1.9163878
Coefficient of variation (CV) 1.6196118
Kurtosis 10.943332
Mean 1.183239
Median Absolute Deviation (MAD) 0
Skewness 2.7071917
Sum 2752667
Variance 3.6725422
Monotonicity Not monotonic
2025-04-28T21:09:03.671993 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
Value Count Frequency (%)
0 1263277
 
18.0%
1 396920
 
5.7%
2 278106
 
4.0%
3 155760
 
2.2%
4 89910
 
1.3%
5 54744
 
0.8%
6 29638
 
0.4%
8 15884
 
0.2%
7 15769
 
0.2%
9 14466
 
0.2%
Other values (9) 11909
 
0.2%
(Missing) 4674107
66.8%
Value Count Frequency (%)
0 1263277
18.0%
1 396920
 
5.7%
2 278106
 
4.0%
3 155760
 
2.2%
4 89910
 
1.3%
5 54744
 
0.8%
6 29638
 
0.4%
7 15769
 
0.2%
8 15884
 
0.2%
9 14466
 
0.2%
Value Count Frequency (%)
18 230
 
< 0.1%
17 2154
 
< 0.1%
16 182
 
< 0.1%
15 543
 
< 0.1%
14 679
 
< 0.1%
13 2091
 
< 0.1%
12 1167
 
< 0.1%
11 2280
 
< 0.1%
10 2583
 
< 0.1%
9 14466
0.2%

XP
Categorical

Imbalance  Missing 

Distinct 3
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
2229563 
1.0
 
94902
2.0
 
1918

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2229563
31.8%
1.0 94902
 
1.4%
2.0 1918
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:03.820642 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:03.934015 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2229563
95.8%
1.0 94902
 
4.1%
2.0 1918
 
0.1%

Most occurring characters

Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

XR
Categorical

Imbalance  Missing 

Distinct 10
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
1854131 
1.0
286792 
2.0
 
114389
3.0
 
39061
4.0
 
23978
Other values (5)
 
8032

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1854131
 
26.5%
1.0 286792
 
4.1%
2.0 114389
 
1.6%
3.0 39061
 
0.6%
4.0 23978
 
0.3%
5.0 6004
 
0.1%
6.0 1329
 
< 0.1%
7.0 406
 
< 0.1%
9.0 199
 
< 0.1%
8.0 94
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:04.057704 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:04.197867 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1854131
79.7%
1.0 286792
 
12.3%
2.0 114389
 
4.9%
3.0 39061
 
1.7%
4.0 23978
 
1.0%
5.0 6004
 
0.3%
6.0 1329
 
0.1%
7.0 406
 
< 0.1%
9.0 199
 
< 0.1%
8.0 94
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

XS
Categorical

Imbalance  Missing 

Distinct 6
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
2265241 
1.0
 
56154
2.0
 
4098
3.0
 
554
4.0
 
210

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2265241
32.4%
1.0 56154
 
0.8%
2.0 4098
 
0.1%
3.0 554
 
< 0.1%
4.0 210
 
< 0.1%
5.0 126
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:04.361405 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:04.484920 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2265241
97.4%
1.0 56154
 
2.4%
2.0 4098
 
0.2%
3.0 554
 
< 0.1%
4.0 210
 
< 0.1%
5.0 126
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

XV
Categorical

Imbalance  Missing 

Distinct 4
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Memory size 53.4 MiB
0.0
2250558 
1.0
 
67586
2.0
 
7430
3.0
 
809

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 5
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2250558
32.1%
1.0 67586
 
1.0%
2.0 7430
 
0.1%
3.0 809
 
< 0.1%
(Missing) 4674107
66.8%

Length

2025-04-28T21:09:04.630943 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:04.745103 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2250558
96.7%
1.0 67586
 
2.9%
2.0 7430
 
0.3%
3.0 809
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

polypharmacy
Real number (ℝ)

Missing 

Distinct 49
Distinct (%) < 0.1%
Missing 4674107
Missing (%) 66.8%
Infinite 0
Infinite (%) 0.0%
Mean 5.6192979
Minimum 1
Maximum 64
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:04.883805 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
median 4
Q3 7
95-th percentile 16
Maximum 64
Range 63
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 6.1094532
Coefficient of variation (CV) 1.0872272
Kurtosis 18.941564
Mean 5.6192979
Median Absolute Deviation (MAD) 2
Skewness 3.5236648
Sum 13072639
Variance 37.325419
Monotonicity Not monotonic
2025-04-28T21:09:05.049864 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
Value Count Frequency (%)
1 411627
 
5.9%
2 365244
 
5.2%
3 301086
 
4.3%
4 248700
 
3.6%
5 195340
 
2.8%
6 160134
 
2.3%
7 120414
 
1.7%
8 98368
 
1.4%
9 72585
 
1.0%
10 57060
 
0.8%
Other values (39) 295825
 
4.2%
(Missing) 4674107
66.8%
Value Count Frequency (%)
1 411627
5.9%
2 365244
5.2%
3 301086
4.3%
4 248700
3.6%
5 195340
2.8%
6 160134
 
2.3%
7 120414
 
1.7%
8 98368
 
1.4%
9 72585
 
1.0%
10 57060
 
0.8%
Value Count Frequency (%)
64 2112
 
< 0.1%
53 106
 
< 0.1%
51 663
 
< 0.1%
48 432
 
< 0.1%
47 1034
 
< 0.1%
46 1012
 
< 0.1%
45 135
 
< 0.1%
44 6424
0.1%
43 86
 
< 0.1%
42 3066
< 0.1%

atc1_concept_name
Categorical

Missing 

Distinct 14
Distinct (%) 1.3%
Missing 6999412
Missing (%) > 99.9%
Memory size 53.4 MiB
NERVOUS SYSTEM
175 
ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS
153 
ALIMENTARY TRACT AND METABOLISM
124 
CARDIOVASCULAR SYSTEM
124 
ANTIINFECTIVES FOR SYSTEMIC USE
114 
Other values (9)
388 

Length

Max length 63
Median length 42
Mean length 26.976809
Min length 7

Characters and Unicode

Total characters 29081
Distinct characters 26
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ALIMENTARY TRACT AND METABOLISM
2nd row BLOOD AND BLOOD FORMING ORGANS
3rd row CARDIOVASCULAR SYSTEM
4th row DERMATOLOGICALS
5th row GENITO URINARY SYSTEM AND SEX HORMONES

Common Values

Value Count Frequency (%)
NERVOUS SYSTEM 175
 
< 0.1%
ANTINEOPLASTIC AND IMMUNOMODULATING AGENTS 153
 
< 0.1%
ALIMENTARY TRACT AND METABOLISM 124
 
< 0.1%
CARDIOVASCULAR SYSTEM 124
 
< 0.1%
ANTIINFECTIVES FOR SYSTEMIC USE 114
 
< 0.1%
DERMATOLOGICALS 67
 
< 0.1%
GENITO URINARY SYSTEM AND SEX HORMONES 52
 
< 0.1%
BLOOD AND BLOOD FORMING ORGANS 50
 
< 0.1%
RESPIRATORY SYSTEM 49
 
< 0.1%
SENSORY ORGANS 46
 
< 0.1%
Other values (4) 124
 
< 0.1%
(Missing) 6999412
> 99.9%

Length

2025-04-28T21:09:05.222053 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
system 445
 
12.9%
and 430
 
12.5%
nervous 175
 
5.1%
antineoplastic 153
 
4.4%
immunomodulating 153
 
4.4%
agents 153
 
4.4%
systemic 146
 
4.2%
alimentary 124
 
3.6%
metabolism 124
 
3.6%
tract 124
 
3.6%
Other values (24) 1423
41.2%

Most occurring characters

Value Count Frequency (%)
S 2920
10.0%
A 2581
 
8.9%
2372
 
8.2%
E 2293
 
7.9%
N 2287
 
7.9%
T 2267
 
7.8%
I 1980
 
6.8%
O 1979
 
6.8%
M 1700
 
5.8%
R 1560
 
5.4%
Other values (16) 7142
24.6%

Most occurring categories

Value Count Frequency (%)
(unknown) 29081
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
S 2920
10.0%
A 2581
 
8.9%
2372
 
8.2%
E 2293
 
7.9%
N 2287
 
7.9%
T 2267
 
7.8%
I 1980
 
6.8%
O 1979
 
6.8%
M 1700
 
5.8%
R 1560
 
5.4%
Other values (16) 7142
24.6%

Most occurring scripts

Value Count Frequency (%)
(unknown) 29081
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
S 2920
10.0%
A 2581
 
8.9%
2372
 
8.2%
E 2293
 
7.9%
N 2287
 
7.9%
T 2267
 
7.8%
I 1980
 
6.8%
O 1979
 
6.8%
M 1700
 
5.8%
R 1560
 
5.4%
Other values (16) 7142
24.6%

Most occurring blocks

Value Count Frequency (%)
(unknown) 29081
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
S 2920
10.0%
A 2581
 
8.9%
2372
 
8.2%
E 2293
 
7.9%
N 2287
 
7.9%
T 2267
 
7.8%
I 1980
 
6.8%
O 1979
 
6.8%
M 1700
 
5.8%
R 1560
 
5.4%
Other values (16) 7142
24.6%

raw_code
Text

Missing 

Distinct 15
Distinct (%) 100.0%
Missing 7000475
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:05.386975 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 3
Median length 2
Mean length 2.0666667
Min length 2

Characters and Unicode

Total characters 31
Distinct characters 15
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 15 ?
Unique (%) 100.0%

Sample

1st row XA
2nd row XB
3rd row XC
4th row XD
5th row XG
Value Count Frequency (%)
xa 1
 
6.7%
xb 1
 
6.7%
xc 1
 
6.7%
xd 1
 
6.7%
xg 1
 
6.7%
xh 1
 
6.7%
xj 1
 
6.7%
xl 1
 
6.7%
xm 1
 
6.7%
xn 1
 
6.7%
Other values (5) 5
33.3%
2025-04-28T21:09:05.704928 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
X 15
48.4%
A 2
 
6.5%
N 2
 
6.5%
C 1
 
3.2%
D 1
 
3.2%
G 1
 
3.2%
B 1
 
3.2%
H 1
 
3.2%
J 1
 
3.2%
L 1
 
3.2%
Other values (5) 5
 
16.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 31
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
X 15
48.4%
A 2
 
6.5%
N 2
 
6.5%
C 1
 
3.2%
D 1
 
3.2%
G 1
 
3.2%
B 1
 
3.2%
H 1
 
3.2%
J 1
 
3.2%
L 1
 
3.2%
Other values (5) 5
 
16.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 31
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
X 15
48.4%
A 2
 
6.5%
N 2
 
6.5%
C 1
 
3.2%
D 1
 
3.2%
G 1
 
3.2%
B 1
 
3.2%
H 1
 
3.2%
J 1
 
3.2%
L 1
 
3.2%
Other values (5) 5
 
16.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 31
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
X 15
48.4%
A 2
 
6.5%
N 2
 
6.5%
C 1
 
3.2%
D 1
 
3.2%
G 1
 
3.2%
B 1
 
3.2%
H 1
 
3.2%
J 1
 
3.2%
L 1
 
3.2%
Other values (5) 5
 
16.1%

gene_symbol
Text

Missing 

Distinct 1253
Distinct (%) 0.6%
Missing 6792831
Missing (%) 97.0%
Memory size 53.4 MiB
2025-04-28T21:09:06.091943 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 15
Median length 12
Mean length 5.4613381
Min length 2

Characters and Unicode

Total characters 1134096
Distinct characters 37
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 343 ?
Unique (%) 0.2%

Sample

1st row CYP3A5
2nd row CYP3A7
3rd row CYP3A7-CYP3A51P
4th row CYP3A5
5th row CYP3A7
Value Count Frequency (%)
ids 1981
 
1.0%
ugt1a1 1896
 
0.9%
slc8a1 1794
 
0.9%
cyp3a4 1762
 
0.8%
cyp2c9 1740
 
0.8%
slc6a2 1638
 
0.8%
gls 1585
 
0.8%
ugt1a9 1442
 
0.7%
ugt1a8 1400
 
0.7%
pou2f2 1387
 
0.7%
Other values (1243) 191034
92.0%
2025-04-28T21:09:06.645606 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
A 133590
 
11.8%
1 119143
 
10.5%
C 118378
 
10.4%
S 82649
 
7.3%
L 68981
 
6.1%
2 66779
 
5.9%
P 58834
 
5.2%
T 45093
 
4.0%
D 34021
 
3.0%
R 31408
 
2.8%
Other values (27) 375220
33.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 1134096
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
A 133590
 
11.8%
1 119143
 
10.5%
C 118378
 
10.4%
S 82649
 
7.3%
L 68981
 
6.1%
2 66779
 
5.9%
P 58834
 
5.2%
T 45093
 
4.0%
D 34021
 
3.0%
R 31408
 
2.8%
Other values (27) 375220
33.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1134096
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
A 133590
 
11.8%
1 119143
 
10.5%
C 118378
 
10.4%
S 82649
 
7.3%
L 68981
 
6.1%
2 66779
 
5.9%
P 58834
 
5.2%
T 45093
 
4.0%
D 34021
 
3.0%
R 31408
 
2.8%
Other values (27) 375220
33.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1134096
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
A 133590
 
11.8%
1 119143
 
10.5%
C 118378
 
10.4%
S 82649
 
7.3%
L 68981
 
6.1%
2 66779
 
5.9%
P 58834
 
5.2%
T 45093
 
4.0%
D 34021
 
3.0%
R 31408
 
2.8%
Other values (27) 375220
33.1%

type
Categorical

Missing 

Distinct 4
Distinct (%) < 0.1%
Missing 6987933
Missing (%) 99.8%
Memory size 53.4 MiB
enzyme
5096 
target
4707 
transporter
2452 
carrier
 
302

Length

Max length 11
Median length 6
Mean length 7.0003982
Min length 6

Characters and Unicode

Total characters 87904
Distinct characters 14
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row enzyme
2nd row enzyme
3rd row enzyme
4th row enzyme
5th row enzyme

Common Values

Value Count Frequency (%)
enzyme 5096
 
0.1%
target 4707
 
0.1%
transporter 2452
 
< 0.1%
carrier 302
 
< 0.1%
(Missing) 6987933
99.8%

Length

2025-04-28T21:09:06.818497 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:06.951138 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
enzyme 5096
40.6%
target 4707
37.5%
transporter 2452
19.5%
carrier 302
 
2.4%

Most occurring characters

Value Count Frequency (%)
e 17653
20.1%
t 14318
16.3%
r 12969
14.8%
n 7548
8.6%
a 7461
8.5%
z 5096
 
5.8%
m 5096
 
5.8%
y 5096
 
5.8%
g 4707
 
5.4%
s 2452
 
2.8%
Other values (4) 5508
 
6.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 87904
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 17653
20.1%
t 14318
16.3%
r 12969
14.8%
n 7548
8.6%
a 7461
8.5%
z 5096
 
5.8%
m 5096
 
5.8%
y 5096
 
5.8%
g 4707
 
5.4%
s 2452
 
2.8%
Other values (4) 5508
 
6.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 87904
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 17653
20.1%
t 14318
16.3%
r 12969
14.8%
n 7548
8.6%
a 7461
8.5%
z 5096
 
5.8%
m 5096
 
5.8%
y 5096
 
5.8%
g 4707
 
5.4%
s 2452
 
2.8%
Other values (4) 5508
 
6.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 87904
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 17653
20.1%
t 14318
16.3%
r 12969
14.8%
n 7548
8.6%
a 7461
8.5%
z 5096
 
5.8%
m 5096
 
5.8%
y 5096
 
5.8%
g 4707
 
5.4%
s 2452
 
2.8%
Other values (4) 5508
 
6.3%

soc
Categorical

Missing 

Distinct 27
Distinct (%) < 0.1%
Missing 6933227
Missing (%) 99.0%
Memory size 53.4 MiB
Nervous system disorders
8341 
General disorders and administration site conditions
6307 
Gastrointestinal disorders
6038 
Psychiatric disorders
5251 
Skin and subcutaneous tissue disorders
5159 
Other values (22)
36167 

Length

Max length 67
Median length 40
Mean length 30.514012
Min length 13

Characters and Unicode

Total characters 2052464
Distinct characters 38
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Eye disorders
2nd row Eye disorders
3rd row Eye disorders
4th row Skin and subcutaneous tissue disorders
5th row Skin and subcutaneous tissue disorders

Common Values

Value Count Frequency (%)
Nervous system disorders 8341
 
0.1%
General disorders and administration site conditions 6307
 
0.1%
Gastrointestinal disorders 6038
 
0.1%
Psychiatric disorders 5251
 
0.1%
Skin and subcutaneous tissue disorders 5159
 
0.1%
Vascular disorders 4380
 
0.1%
Respiratory, thoracic and mediastinal disorders 4052
 
0.1%
Cardiac disorders 3832
 
0.1%
Infections and infestations 2973
 
< 0.1%
Musculoskeletal and connective tissue disorders 2866
 
< 0.1%
Other values (17) 18064
 
0.3%
(Missing) 6933227
99.0%

Length

2025-04-28T21:09:07.101796 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
disorders 60183
25.7%
and 31771
 
13.6%
system 13965
 
6.0%
nervous 8341
 
3.6%
tissue 8025
 
3.4%
conditions 6386
 
2.7%
general 6307
 
2.7%
site 6307
 
2.7%
administration 6307
 
2.7%
gastrointestinal 6038
 
2.6%
Other values (51) 80305
34.3%

Most occurring characters

Value Count Frequency (%)
s 259632
12.6%
i 194572
9.5%
r 188606
9.2%
d 178494
8.7%
166672
8.1%
e 164933
8.0%
o 141377
 
6.9%
n 133393
 
6.5%
a 130617
 
6.4%
t 123600
 
6.0%
Other values (28) 370568
18.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 2052464
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
s 259632
12.6%
i 194572
9.5%
r 188606
9.2%
d 178494
8.7%
166672
8.1%
e 164933
8.0%
o 141377
 
6.9%
n 133393
 
6.5%
a 130617
 
6.4%
t 123600
 
6.0%
Other values (28) 370568
18.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 2052464
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
s 259632
12.6%
i 194572
9.5%
r 188606
9.2%
d 178494
8.7%
166672
8.1%
e 164933
8.0%
o 141377
 
6.9%
n 133393
 
6.5%
a 130617
 
6.4%
t 123600
 
6.0%
Other values (28) 370568
18.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 2052464
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
s 259632
12.6%
i 194572
9.5%
r 188606
9.2%
d 178494
8.7%
166672
8.1%
e 164933
8.0%
o 141377
 
6.9%
n 133393
 
6.5%
a 130617
 
6.4%
t 123600
 
6.0%
Other values (28) 370568
18.1%

auroc
Real number (ℝ)

Missing 

Distinct 233
Distinct (%) 99.1%
Missing 7000255
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.47671997
Minimum 0.083333333
Maximum 0.93055556
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:07.260477 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0.083333333
5-th percentile 0.26419397
Q1 0.41988122
median 0.48092869
Q3 0.53741497
95-th percentile 0.65424383
Maximum 0.93055556
Range 0.84722222
Interquartile range (IQR) 0.11753374

Descriptive statistics

Standard deviation 0.11854243
Coefficient of variation (CV) 0.2486626
Kurtosis 2.2514665
Mean 0.47671997
Median Absolute Deviation (MAD) 0.05842869
Skewness -0.048877036
Sum 112.02919
Variance 0.014052307
Monotonicity Not monotonic
2025-04-28T21:09:07.422603 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0.4375 2
 
< 0.1%
0.4732142857 2
 
< 0.1%
0.150390625 1
 
< 0.1%
0.4936560729 1
 
< 0.1%
0.3718171296 1
 
< 0.1%
0.5369929453 1
 
< 0.1%
0.600877193 1
 
< 0.1%
0.515625 1
 
< 0.1%
0.5402777778 1
 
< 0.1%
0.3902116402 1
 
< 0.1%
Other values (223) 223
 
< 0.1%
(Missing) 7000255
> 99.9%
Value Count Frequency (%)
0.08333333333 1
< 0.1%
0.1111111111 1
< 0.1%
0.1385542169 1
< 0.1%
0.150390625 1
< 0.1%
0.1611111111 1
< 0.1%
0.2173032407 1
< 0.1%
0.23 1
< 0.1%
0.2352941176 1
< 0.1%
0.25 1
< 0.1%
0.2521367521 1
< 0.1%
Value Count Frequency (%)
0.9305555556 1
< 0.1%
0.9290540541 1
< 0.1%
0.7903225806 1
< 0.1%
0.75 1
< 0.1%
0.7195121951 1
< 0.1%
0.6984126984 1
< 0.1%
0.6955128205 1
< 0.1%
0.6869212963 1
< 0.1%
0.6861111111 1
< 0.1%
0.6746794872 1
< 0.1%

wt_pvalue
Real number (ℝ)

Missing 

Distinct 235
Distinct (%) 100.0%
Missing 7000255
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.57255544
Minimum 0.0078758214
Maximum 0.99993901
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:07.790606 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0.0078758214
5-th percentile 0.056166847
Q1 0.32733034
median 0.59899573
Q3 0.83842706
95-th percentile 0.988492
Maximum 0.99993901
Range 0.99206319
Interquartile range (IQR) 0.51109672

Descriptive statistics

Standard deviation 0.296117
Coefficient of variation (CV) 0.51718485
Kurtosis -1.1432644
Mean 0.57255544
Median Absolute Deviation (MAD) 0.25594247
Skewness -0.20477337
Sum 134.55053
Variance 0.087685276
Monotonicity Not monotonic
2025-04-28T21:09:07.974270 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
0.9999336179 1
 
< 0.1%
0.9999390072 1
 
< 0.1%
0.958090435 1
 
< 0.1%
0.563618466 1
 
< 0.1%
0.2461603291 1
 
< 0.1%
0.2197047828 1
 
< 0.1%
0.1745305563 1
 
< 0.1%
0.4282383109 1
 
< 0.1%
0.9973051037 1
 
< 0.1%
0.1054459042 1
 
< 0.1%
Other values (225) 225
 
< 0.1%
(Missing) 7000255
> 99.9%
Value Count Frequency (%)
0.007875821411 1
< 0.1%
0.01885874666 1
< 0.1%
0.02516231635 1
< 0.1%
0.02594356173 1
< 0.1%
0.03685739242 1
< 0.1%
0.03833789307 1
< 0.1%
0.0388565707 1
< 0.1%
0.04585921972 1
< 0.1%
0.04780904617 1
< 0.1%
0.05022964562 1
< 0.1%
Value Count Frequency (%)
0.9999390072 1
< 0.1%
0.9999336179 1
< 0.1%
0.9994172067 1
< 0.1%
0.9977326847 1
< 0.1%
0.9973051037 1
< 0.1%
0.9972529572 1
< 0.1%
0.9971500118 1
< 0.1%
0.9946565256 1
< 0.1%
0.9935559335 1
< 0.1%
0.9912441299 1
< 0.1%

ttest_statistic
Real number (ℝ)

Missing 

Distinct 235
Distinct (%) 100.0%
Missing 7000255
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean -2.3099737
Minimum -60.372068
Maximum 23.081928
Zeros 0
Zeros (%) 0.0%
Negative 133
Negative (%) < 0.1%
Memory size 53.4 MiB
2025-04-28T21:09:08.147002 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -60.372068
5-th percentile -20.763513
Q1 -7.6673791
median -1.7185903
Q3 4.5312964
95-th percentile 14.594001
Maximum 23.081928
Range 83.453996
Interquartile range (IQR) 12.198675

Descriptive statistics

Standard deviation 11.14009
Coefficient of variation (CV) -4.8226049
Kurtosis 3.4655573
Mean -2.3099737
Median Absolute Deviation (MAD) 6.2292704
Skewness -0.93975973
Sum -542.84381
Variance 124.10161
Monotonicity Not monotonic
2025-04-28T21:09:08.316992 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
-37.90370916 1
 
< 0.1%
-39.4351703 1
 
< 0.1%
-9.718788705 1
 
< 0.1%
0.006710498014 1
 
< 0.1%
3.188390787 1
 
< 0.1%
5.099427421 1
 
< 0.1%
8.59195464 1
 
< 0.1%
0.6722984945 1
 
< 0.1%
-21.78521858 1
 
< 0.1%
16.110469 1
 
< 0.1%
Other values (225) 225
 
< 0.1%
(Missing) 7000255
> 99.9%
Value Count Frequency (%)
-60.37206756 1
< 0.1%
-39.4351703 1
< 0.1%
-39.09024241 1
< 0.1%
-37.90370916 1
< 0.1%
-28.53982044 1
< 0.1%
-25.67812136 1
< 0.1%
-22.80229218 1
< 0.1%
-22.58930136 1
< 0.1%
-21.78521858 1
< 0.1%
-21.55455481 1
< 0.1%
Value Count Frequency (%)
23.08192808 1
< 0.1%
22.92145441 1
< 0.1%
22.65311138 1
< 0.1%
20.26233897 1
< 0.1%
19.09397205 1
< 0.1%
18.62653828 1
< 0.1%
17.6888374 1
< 0.1%
17.54362698 1
< 0.1%
16.110469 1
< 0.1%
16.0293788 1
< 0.1%

ttest_pvalue
Real number (ℝ)

Missing 

Distinct 179
Distinct (%) 76.2%
Missing 7000255
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 0.5707745
Minimum 2.0677695 × 10-114
Maximum 1
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:08.484936 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 2.0677695 × 10-114
5-th percentile 1.0837039 × 10-40
Q1 3.8748995 × 10-6
median 0.9570781
Q3 1
95-th percentile 1
Maximum 1
Range 1
Interquartile range (IQR) 0.99999613

Descriptive statistics

Standard deviation 0.47112558
Coefficient of variation (CV) 0.82541455
Kurtosis -1.8587934
Mean 0.5707745
Median Absolute Deviation (MAD) 0.042921901
Skewness -0.28450892
Sum 134.13201
Variance 0.22195931
Monotonicity Not monotonic
2025-04-28T21:09:08.658910 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 57
 
< 0.1%
0.4973229824 1
 
< 0.1%
0.0007210569409 1
 
< 0.1%
1.758319099 × 10-7 1
 
< 0.1%
1.131634688 × 10-17 1
 
< 0.1%
0.2507392894 1
 
< 0.1%
1.175495962 × 10-48 1
 
< 0.1%
0.8472753261 1
 
< 0.1%
6.596486407 × 10-70 1
 
< 0.1%
1.269543798 × 10-11 1
 
< 0.1%
Other values (169) 169
 
< 0.1%
(Missing) 7000255
> 99.9%
Value Count Frequency (%)
2.067769459 × 10-114 1
< 0.1%
3.836618405 × 10-95 1
< 0.1%
2.847371883 × 10-75 1
< 0.1%
6.596486407 × 10-70 1
< 0.1%
1.030453873 × 10-68 1
< 0.1%
2.079878132 × 10-60 1
< 0.1%
8.404888358 × 10-58 1
< 0.1%
5.239143488 × 10-55 1
< 0.1%
1.955767943 × 10-54 1
< 0.1%
1.175495962 × 10-48 1
< 0.1%
Value Count Frequency (%)
1 57
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%
1 1
 
< 0.1%

atc_concept_code
Text

Missing 

Distinct 1088
Distinct (%) 1.6%
Missing 6932374
Missing (%) 99.0%
Memory size 53.4 MiB
2025-04-28T21:09:08.986741 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 7
Median length 7
Mean length 7
Min length 7

Characters and Unicode

Total characters 476812
Distinct characters 29
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 341 ?
Unique (%) 0.5%

Sample

1st row C09DX04
2nd row R07AX30
3rd row A01AA01
4th row A01AA04
5th row A01AB02
Value Count Frequency (%)
n05ax12 938
 
1.4%
n06ab10 801
 
1.2%
n05ax08 788
 
1.2%
n06ab04 776
 
1.1%
n03ax11 577
 
0.8%
n06ab03 567
 
0.8%
n03ax16 522
 
0.8%
n06ab05 518
 
0.8%
j01ma02 517
 
0.8%
n02ab03 509
 
0.7%
Other values (1078) 61603
90.4%
2025-04-28T21:09:09.484512 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 125259
26.3%
A 64232
13.5%
1 47973
 
10.1%
B 26895
 
5.6%
2 25564
 
5.4%
N 23168
 
4.9%
3 20267
 
4.3%
C 19213
 
4.0%
X 15155
 
3.2%
5 15093
 
3.2%
Other values (19) 93993
19.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 476812
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 125259
26.3%
A 64232
13.5%
1 47973
 
10.1%
B 26895
 
5.6%
2 25564
 
5.4%
N 23168
 
4.9%
3 20267
 
4.3%
C 19213
 
4.0%
X 15155
 
3.2%
5 15093
 
3.2%
Other values (19) 93993
19.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 476812
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 125259
26.3%
A 64232
13.5%
1 47973
 
10.1%
B 26895
 
5.6%
2 25564
 
5.4%
N 23168
 
4.9%
3 20267
 
4.3%
C 19213
 
4.0%
X 15155
 
3.2%
5 15093
 
3.2%
Other values (19) 93993
19.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 476812
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 125259
26.3%
A 64232
13.5%
1 47973
 
10.1%
B 26895
 
5.6%
2 25564
 
5.4%
N 23168
 
4.9%
3 20267
 
4.3%
C 19213
 
4.0%
X 15155
 
3.2%
5 15093
 
3.2%
Other values (19) 93993
19.7%

ndrugreports
Real number (ℝ)

Missing 

Distinct 483
Distinct (%) 44.4%
Missing 6999402
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 553.87132
Minimum 1
Maximum 14508
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:09.669152 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 2
Q1 17
median 74
Q3 343
95-th percentile 2756.25
Maximum 14508
Range 14507
Interquartile range (IQR) 326

Descriptive statistics

Standard deviation 1456.6921
Coefficient of variation (CV) 2.6300189
Kurtosis 32.45299
Mean 553.87132
Median Absolute Deviation (MAD) 69
Skewness 5.093829
Sum 602612
Variance 2121951.8
Monotonicity Not monotonic
2025-04-28T21:09:09.840834 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 37
 
< 0.1%
3 34
 
< 0.1%
2 32
 
< 0.1%
9 20
 
< 0.1%
5 19
 
< 0.1%
10 19
 
< 0.1%
11 17
 
< 0.1%
6 15
 
< 0.1%
19 15
 
< 0.1%
7 14
 
< 0.1%
Other values (473) 866
 
< 0.1%
(Missing) 6999402
> 99.9%
Value Count Frequency (%)
1 37
< 0.1%
2 32
< 0.1%
3 34
< 0.1%
4 13
 
< 0.1%
5 19
< 0.1%
6 15
< 0.1%
7 14
 
< 0.1%
8 12
 
< 0.1%
9 20
< 0.1%
10 19
< 0.1%
Value Count Frequency (%)
14508 1
< 0.1%
13385 1
< 0.1%
12625 1
< 0.1%
12320 1
< 0.1%
12078 1
< 0.1%
10111 1
< 0.1%
9756 1
< 0.1%
9116 1
< 0.1%
8312 1
< 0.1%
8212 1
< 0.1%

atc4_concept_name
Text

Missing 

Distinct 395
Distinct (%) 36.3%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:10.112391 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 92
Median length 53
Mean length 27.363051
Min length 3

Characters and Unicode

Total characters 29771
Distinct characters 60
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 168 ?
Unique (%) 15.4%

Sample

1st row Angiotensin II receptor blockers (ARBs), other combinations
2nd row Other respiratory system products
3rd row Caries prophylactic agents
4th row Caries prophylactic agents
5th row Antiinfectives and antiseptics for local oral treatment
Value Count Frequency (%)
other 181
 
5.5%
and 148
 
4.5%
inhibitors 145
 
4.4%
derivatives 121
 
3.7%
agents 93
 
2.8%
for 79
 
2.4%
drugs 49
 
1.5%
analogues 43
 
1.3%
plain 42
 
1.3%
selective 42
 
1.3%
Other values (512) 2349
71.4%
2025-04-28T21:09:10.565709 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
i 2906
 
9.8%
e 2753
 
9.2%
t 2342
 
7.9%
n 2305
 
7.7%
2204
 
7.4%
a 2164
 
7.3%
s 2098
 
7.0%
o 1889
 
6.3%
r 1868
 
6.3%
c 1010
 
3.4%
Other values (50) 8232
27.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 29771
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 2906
 
9.8%
e 2753
 
9.2%
t 2342
 
7.9%
n 2305
 
7.7%
2204
 
7.4%
a 2164
 
7.3%
s 2098
 
7.0%
o 1889
 
6.3%
r 1868
 
6.3%
c 1010
 
3.4%
Other values (50) 8232
27.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 29771
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 2906
 
9.8%
e 2753
 
9.2%
t 2342
 
7.9%
n 2305
 
7.7%
2204
 
7.4%
a 2164
 
7.3%
s 2098
 
7.0%
o 1889
 
6.3%
r 1868
 
6.3%
c 1010
 
3.4%
Other values (50) 8232
27.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 29771
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 2906
 
9.8%
e 2753
 
9.2%
t 2342
 
7.9%
n 2305
 
7.7%
2204
 
7.4%
a 2164
 
7.3%
s 2098
 
7.0%
o 1889
 
6.3%
r 1868
 
6.3%
c 1010
 
3.4%
Other values (50) 8232
27.7%

atc4_concept_code
Text

Missing 

Distinct 418
Distinct (%) 38.4%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:10.918955 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 5
Median length 5
Mean length 4.9558824
Min length 3

Characters and Unicode

Total characters 5392
Distinct characters 31
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 183 ?
Unique (%) 16.8%

Sample

1st row C09DX
2nd row R07AX
3rd row A01AA
4th row A01AA
5th row A01AB
Value Count Frequency (%)
nan 24
 
2.2%
l01xe 23
 
2.1%
l01xx 18
 
1.7%
l04aa 15
 
1.4%
b01ac 10
 
0.9%
d07ac 10
 
0.9%
n03ax 10
 
0.9%
n06ax 9
 
0.8%
n06aa 9
 
0.8%
j05af 9
 
0.8%
Other values (408) 951
87.4%
2025-04-28T21:09:11.448255 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 1025
19.0%
A 995
18.5%
B 441
 
8.2%
1 412
 
7.6%
C 372
 
6.9%
X 233
 
4.3%
D 200
 
3.7%
N 175
 
3.2%
L 155
 
2.9%
3 154
 
2.9%
Other values (21) 1230
22.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 5392
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 1025
19.0%
A 995
18.5%
B 441
 
8.2%
1 412
 
7.6%
C 372
 
6.9%
X 233
 
4.3%
D 200
 
3.7%
N 175
 
3.2%
L 155
 
2.9%
3 154
 
2.9%
Other values (21) 1230
22.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 5392
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 1025
19.0%
A 995
18.5%
B 441
 
8.2%
1 412
 
7.6%
C 372
 
6.9%
X 233
 
4.3%
D 200
 
3.7%
N 175
 
3.2%
L 155
 
2.9%
3 154
 
2.9%
Other values (21) 1230
22.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 5392
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 1025
19.0%
A 995
18.5%
B 441
 
8.2%
1 412
 
7.6%
C 372
 
6.9%
X 233
 
4.3%
D 200
 
3.7%
N 175
 
3.2%
L 155
 
2.9%
3 154
 
2.9%
Other values (21) 1230
22.8%

atc3_concept_name
Text

Missing 

Distinct 176
Distinct (%) 16.2%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:11.772838 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 71
Median length 53
Mean length 28.499081
Min length 3

Characters and Unicode

Total characters 31007
Distinct characters 38
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 34 ?
Unique (%) 3.1%

Sample

1st row ANGIOTENSIN II RECEPTOR BLOCKERS (ARBs), COMBINATIONS
2nd row OTHER RESPIRATORY SYSTEM PRODUCTS
3rd row STOMATOLOGICAL PREPARATIONS
4th row STOMATOLOGICAL PREPARATIONS
5th row STOMATOLOGICAL PREPARATIONS
Value Count Frequency (%)
and 220
 
6.1%
agents 217
 
6.1%
other 205
 
5.7%
for 151
 
4.2%
drugs 98
 
2.7%
use 83
 
2.3%
products 77
 
2.1%
preparations 72
 
2.0%
acting 62
 
1.7%
plain 62
 
1.7%
Other values (259) 2338
65.2%
2025-04-28T21:09:12.286397 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
A 2914
 
9.4%
T 2892
 
9.3%
I 2742
 
8.8%
2497
 
8.1%
S 2479
 
8.0%
E 2304
 
7.4%
N 2294
 
7.4%
O 1932
 
6.2%
R 1789
 
5.8%
C 1429
 
4.6%
Other values (28) 7735
24.9%

Most occurring categories

Value Count Frequency (%)
(unknown) 31007
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
A 2914
 
9.4%
T 2892
 
9.3%
I 2742
 
8.8%
2497
 
8.1%
S 2479
 
8.0%
E 2304
 
7.4%
N 2294
 
7.4%
O 1932
 
6.2%
R 1789
 
5.8%
C 1429
 
4.6%
Other values (28) 7735
24.9%

Most occurring scripts

Value Count Frequency (%)
(unknown) 31007
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
A 2914
 
9.4%
T 2892
 
9.3%
I 2742
 
8.8%
2497
 
8.1%
S 2479
 
8.0%
E 2304
 
7.4%
N 2294
 
7.4%
O 1932
 
6.2%
R 1789
 
5.8%
C 1429
 
4.6%
Other values (28) 7735
24.9%

Most occurring blocks

Value Count Frequency (%)
(unknown) 31007
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
A 2914
 
9.4%
T 2892
 
9.3%
I 2742
 
8.8%
2497
 
8.1%
S 2479
 
8.0%
E 2304
 
7.4%
N 2294
 
7.4%
O 1932
 
6.2%
R 1789
 
5.8%
C 1429
 
4.6%
Other values (28) 7735
24.9%

atc3_concept_code
Text

Missing 

Distinct 176
Distinct (%) 16.2%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:12.615347 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 4
Median length 4
Mean length 3.9779412
Min length 3

Characters and Unicode

Total characters 4328
Distinct characters 30
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 34 ?
Unique (%) 3.1%

Sample

1st row C09D
2nd row R07A
3rd row A01A
4th row A01A
5th row A01A
Value Count Frequency (%)
l01x 53
 
4.9%
j05a 39
 
3.6%
l04a 34
 
3.1%
n03a 26
 
2.4%
n06a 26
 
2.4%
nan 24
 
2.2%
v03a 23
 
2.1%
j01d 22
 
2.0%
b01a 21
 
1.9%
n05a 20
 
1.8%
Other values (166) 800
73.5%
2025-04-28T21:09:13.085638 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 1025
23.7%
A 670
15.5%
1 412
9.5%
B 253
 
5.8%
C 223
 
5.2%
N 174
 
4.0%
L 154
 
3.6%
3 154
 
3.6%
D 133
 
3.1%
5 122
 
2.8%
Other values (20) 1008
23.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 4328
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 1025
23.7%
A 670
15.5%
1 412
9.5%
B 253
 
5.8%
C 223
 
5.2%
N 174
 
4.0%
L 154
 
3.6%
3 154
 
3.6%
D 133
 
3.1%
5 122
 
2.8%
Other values (20) 1008
23.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 4328
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 1025
23.7%
A 670
15.5%
1 412
9.5%
B 253
 
5.8%
C 223
 
5.2%
N 174
 
4.0%
L 154
 
3.6%
3 154
 
3.6%
D 133
 
3.1%
5 122
 
2.8%
Other values (20) 1008
23.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 4328
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 1025
23.7%
A 670
15.5%
1 412
9.5%
B 253
 
5.8%
C 223
 
5.2%
N 174
 
4.0%
L 154
 
3.6%
3 154
 
3.6%
D 133
 
3.1%
5 122
 
2.8%
Other values (20) 1008
23.3%

atc2_concept_name
Text

Missing 

Distinct 80
Distinct (%) 7.4%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:13.360610 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 64
Median length 41
Mean length 24.675551
Min length 3

Characters and Unicode

Total characters 26847
Distinct characters 32
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 6 ?
Unique (%) 0.6%

Sample

1st row AGENTS ACTING ON THE RENIN-ANGIOTENSIN SYSTEM
2nd row OTHER RESPIRATORY SYSTEM PRODUCTS
3rd row STOMATOLOGICAL PREPARATIONS
4th row STOMATOLOGICAL PREPARATIONS
5th row STOMATOLOGICAL PREPARATIONS
Value Count Frequency (%)
for 209
 
6.9%
agents 190
 
6.3%
use 148
 
4.9%
and 143
 
4.7%
systemic 125
 
4.1%
drugs 104
 
3.4%
antineoplastic 90
 
3.0%
other 76
 
2.5%
system 69
 
2.3%
products 68
 
2.3%
Other values (131) 1798
59.5%
2025-04-28T21:09:13.799517 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
S 2509
 
9.3%
T 2493
 
9.3%
A 2480
 
9.2%
I 2250
 
8.4%
E 2101
 
7.8%
1932
 
7.2%
N 1841
 
6.9%
O 1739
 
6.5%
R 1459
 
5.4%
C 1262
 
4.7%
Other values (22) 6781
25.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 26847
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
S 2509
 
9.3%
T 2493
 
9.3%
A 2480
 
9.2%
I 2250
 
8.4%
E 2101
 
7.8%
1932
 
7.2%
N 1841
 
6.9%
O 1739
 
6.5%
R 1459
 
5.4%
C 1262
 
4.7%
Other values (22) 6781
25.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 26847
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
S 2509
 
9.3%
T 2493
 
9.3%
A 2480
 
9.2%
I 2250
 
8.4%
E 2101
 
7.8%
1932
 
7.2%
N 1841
 
6.9%
O 1739
 
6.5%
R 1459
 
5.4%
C 1262
 
4.7%
Other values (22) 6781
25.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 26847
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
S 2509
 
9.3%
T 2493
 
9.3%
A 2480
 
9.2%
I 2250
 
8.4%
E 2101
 
7.8%
1932
 
7.2%
N 1841
 
6.9%
O 1739
 
6.5%
R 1459
 
5.4%
C 1262
 
4.7%
Other values (22) 6781
25.3%

atc2_concept_code
Text

Missing 

Distinct 80
Distinct (%) 7.4%
Missing 6999402
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:14.041506 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 3264
Distinct characters 26
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 6 ?
Unique (%) 0.6%

Sample

1st row C09
2nd row R07
3rd row A01
4th row A01
5th row A01
Value Count Frequency (%)
l01 90
 
8.3%
j01 53
 
4.9%
s01 44
 
4.0%
n05 42
 
3.9%
n06 39
 
3.6%
j05 39
 
3.6%
l04 34
 
3.1%
n02 27
 
2.5%
g03 26
 
2.4%
n03 26
 
2.4%
Other values (70) 668
61.4%
2025-04-28T21:09:14.415499 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 1025
31.4%
1 412
12.6%
N 174
 
5.3%
3 154
 
4.7%
L 152
 
4.7%
C 123
 
3.8%
A 123
 
3.8%
5 122
 
3.7%
J 113
 
3.5%
6 107
 
3.3%
Other values (16) 759
23.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 3264
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 1025
31.4%
1 412
12.6%
N 174
 
5.3%
3 154
 
4.7%
L 152
 
4.7%
C 123
 
3.8%
A 123
 
3.8%
5 122
 
3.7%
J 113
 
3.5%
6 107
 
3.3%
Other values (16) 759
23.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 3264
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 1025
31.4%
1 412
12.6%
N 174
 
5.3%
3 154
 
4.7%
L 152
 
4.7%
C 123
 
3.8%
A 123
 
3.8%
5 122
 
3.7%
J 113
 
3.5%
6 107
 
3.3%
Other values (16) 759
23.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 3264
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 1025
31.4%
1 412
12.6%
N 174
 
5.3%
3 154
 
4.7%
L 152
 
4.7%
C 123
 
3.8%
A 123
 
3.8%
5 122
 
3.7%
J 113
 
3.5%
6 107
 
3.3%
Other values (16) 759
23.3%

atc1_concept_code
Categorical

Missing 

Distinct 14
Distinct (%) 1.3%
Missing 6999426
Missing (%) > 99.9%
Memory size 53.4 MiB
N
174 
L
152 
A
123 
C
123 
J
113 
Other values (9)
379 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 1064
Distinct characters 14
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row C
2nd row R
3rd row A
4th row A
5th row A

Common Values

Value Count Frequency (%)
N 174
 
< 0.1%
L 152
 
< 0.1%
A 123
 
< 0.1%
C 123
 
< 0.1%
J 113
 
< 0.1%
D 66
 
< 0.1%
G 51
 
< 0.1%
B 49
 
< 0.1%
R 48
 
< 0.1%
S 45
 
< 0.1%
Other values (4) 120
 
< 0.1%
(Missing) 6999426
> 99.9%

Length

2025-04-28T21:09:14.574567 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
n 174
16.4%
l 152
14.3%
a 123
11.6%
c 123
11.6%
j 113
10.6%
d 66
 
6.2%
g 51
 
4.8%
b 49
 
4.6%
r 48
 
4.5%
s 45
 
4.2%
Other values (4) 120
11.3%

Most occurring characters

Value Count Frequency (%)
N 174
16.4%
L 152
14.3%
A 123
11.6%
C 123
11.6%
J 113
10.6%
D 66
 
6.2%
G 51
 
4.8%
B 49
 
4.6%
R 48
 
4.5%
S 45
 
4.2%
Other values (4) 120
11.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 1064
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
N 174
16.4%
L 152
14.3%
A 123
11.6%
C 123
11.6%
J 113
10.6%
D 66
 
6.2%
G 51
 
4.8%
B 49
 
4.6%
R 48
 
4.5%
S 45
 
4.2%
Other values (4) 120
11.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1064
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
N 174
16.4%
L 152
14.3%
A 123
11.6%
C 123
11.6%
J 113
10.6%
D 66
 
6.2%
G 51
 
4.8%
B 49
 
4.6%
R 48
 
4.5%
S 45
 
4.2%
Other values (4) 120
11.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1064
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
N 174
16.4%
L 152
14.3%
A 123
11.6%
C 123
11.6%
J 113
10.6%
D 66
 
6.2%
G 51
 
4.8%
B 49
 
4.6%
R 48
 
4.5%
S 45
 
4.2%
Other values (4) 120
11.3%

drugbank_id
Text

Missing 

Distinct 1566
Distinct (%) 12.7%
Missing 6988168
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:14.875828 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 7
Median length 7
Mean length 7
Min length 7

Characters and Unicode

Total characters 86254
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 306 ?
Unique (%) 2.5%

Sample

1st row DB00001
2nd row DB00002
3rd row DB00002
4th row DB00002
5th row DB00002
Value Count Frequency (%)
db00741 253
 
2.1%
db00783 192
 
1.6%
db00977 147
 
1.2%
db00586 145
 
1.2%
db01234 108
 
0.9%
db00281 96
 
0.8%
db00620 72
 
0.6%
db00661 66
 
0.5%
db00388 64
 
0.5%
db00316 63
 
0.5%
Other values (1556) 11116
90.2%
2025-04-28T21:09:15.353164 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 22481
26.1%
D 12322
14.3%
B 12322
14.3%
1 7686
 
8.9%
8 4334
 
5.0%
3 4187
 
4.9%
2 4146
 
4.8%
6 3963
 
4.6%
7 3915
 
4.5%
9 3696
 
4.3%
Other values (2) 7202
 
8.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 86254
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 22481
26.1%
D 12322
14.3%
B 12322
14.3%
1 7686
 
8.9%
8 4334
 
5.0%
3 4187
 
4.9%
2 4146
 
4.8%
6 3963
 
4.6%
7 3915
 
4.5%
9 3696
 
4.3%
Other values (2) 7202
 
8.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 86254
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 22481
26.1%
D 12322
14.3%
B 12322
14.3%
1 7686
 
8.9%
8 4334
 
5.0%
3 4187
 
4.9%
2 4146
 
4.8%
6 3963
 
4.6%
7 3915
 
4.5%
9 3696
 
4.3%
Other values (2) 7202
 
8.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 86254
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 22481
26.1%
D 12322
14.3%
B 12322
14.3%
1 7686
 
8.9%
8 4334
 
5.0%
3 4187
 
4.9%
2 4146
 
4.8%
6 3963
 
4.6%
7 3915
 
4.5%
9 3696
 
4.3%
Other values (2) 7202
 
8.3%

id
Text

Missing 

Distinct 1264
Distinct (%) 10.3%
Missing 6988168
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:15.653566 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 9
Median length 9
Mean length 9
Min length 9

Characters and Unicode

Total characters 110898
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 487 ?
Unique (%) 4.0%

Sample

1st row BE0000048
2nd row BE0000767
3rd row BE0000901
4th row BE0002094
5th row BE0002095
Value Count Frequency (%)
be0002638 751
 
6.1%
be0001032 402
 
3.3%
be0002363 342
 
2.8%
be0002793 333
 
2.7%
be0003612 286
 
2.3%
be0003536 286
 
2.3%
be0002433 281
 
2.3%
be0002887 260
 
2.1%
be0002362 257
 
2.1%
be0003549 190
 
1.5%
Other values (1254) 8934
72.5%
2025-04-28T21:09:16.089854 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 45130
40.7%
B 12322
 
11.1%
E 12322
 
11.1%
3 8404
 
7.6%
2 5992
 
5.4%
6 5454
 
4.9%
1 4375
 
3.9%
4 4019
 
3.6%
5 3555
 
3.2%
7 3427
 
3.1%
Other values (2) 5898
 
5.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 110898
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 45130
40.7%
B 12322
 
11.1%
E 12322
 
11.1%
3 8404
 
7.6%
2 5992
 
5.4%
6 5454
 
4.9%
1 4375
 
3.9%
4 4019
 
3.6%
5 3555
 
3.2%
7 3427
 
3.1%
Other values (2) 5898
 
5.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 110898
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 45130
40.7%
B 12322
 
11.1%
E 12322
 
11.1%
3 8404
 
7.6%
2 5992
 
5.4%
6 5454
 
4.9%
1 4375
 
3.9%
4 4019
 
3.6%
5 3555
 
3.2%
7 3427
 
3.1%
Other values (2) 5898
 
5.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 110898
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 45130
40.7%
B 12322
 
11.1%
E 12322
 
11.1%
3 8404
 
7.6%
2 5992
 
5.4%
6 5454
 
4.9%
1 4375
 
3.9%
4 4019
 
3.6%
5 3555
 
3.2%
7 3427
 
3.1%
Other values (2) 5898
 
5.3%

action
Categorical

Imbalance  Missing 

Distinct 44
Distinct (%) 0.4%
Missing 6988168
Missing (%) 99.8%
Memory size 53.4 MiB
substrate
4851 
inhibitor
3362 
antagonist
1347 
agonist
959 
inducer
607 
Other values (39)
1196 

Length

Max length 31
Median length 9
Mean length 8.873803
Min length 5

Characters and Unicode

Total characters 109343
Distinct characters 26
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 6 ?
Unique (%) < 0.1%

Sample

1st row inhibitor
2nd row binder
3rd row binder
4th row binder
5th row binder

Common Values

Value Count Frequency (%)
substrate 4851
 
0.1%
inhibitor 3362
 
< 0.1%
antagonist 1347
 
< 0.1%
agonist 959
 
< 0.1%
inducer 607
 
< 0.1%
binder 419
 
< 0.1%
activator 98
 
< 0.1%
ligand 94
 
< 0.1%
potentiator 92
 
< 0.1%
other/unknown 72
 
< 0.1%
Other values (34) 421
 
< 0.1%
(Missing) 6988168
99.8%

Length

2025-04-28T21:09:16.275508 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
substrate 4851
38.8%
inhibitor 3365
26.9%
antagonist 1348
 
10.8%
agonist 1023
 
8.2%
inducer 607
 
4.9%
binder 419
 
3.3%
modulator 105
 
0.8%
activator 98
 
0.8%
ligand 94
 
0.8%
potentiator 92
 
0.7%
Other values (35) 513
 
4.1%

Most occurring characters

Value Count Frequency (%)
t 17804
16.3%
i 14093
12.9%
s 12213
11.2%
r 9957
9.1%
a 9381
8.6%
b 8694
8.0%
n 8680
7.9%
o 6766
 
6.2%
e 6315
 
5.8%
u 5708
 
5.2%
Other values (16) 9732
8.9%

Most occurring categories

Value Count Frequency (%)
(unknown) 109343
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
t 17804
16.3%
i 14093
12.9%
s 12213
11.2%
r 9957
9.1%
a 9381
8.6%
b 8694
8.0%
n 8680
7.9%
o 6766
 
6.2%
e 6315
 
5.8%
u 5708
 
5.2%
Other values (16) 9732
8.9%

Most occurring scripts

Value Count Frequency (%)
(unknown) 109343
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
t 17804
16.3%
i 14093
12.9%
s 12213
11.2%
r 9957
9.1%
a 9381
8.6%
b 8694
8.0%
n 8680
7.9%
o 6766
 
6.2%
e 6315
 
5.8%
u 5708
 
5.2%
Other values (16) 9732
8.9%

Most occurring blocks

Value Count Frequency (%)
(unknown) 109343
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
t 17804
16.3%
i 14093
12.9%
s 12213
11.2%
r 9957
9.1%
a 9381
8.6%
b 8694
8.0%
n 8680
7.9%
o 6766
 
6.2%
e 6315
 
5.8%
u 5708
 
5.2%
Other values (16) 9732
8.9%

uniprot_id
Text

Missing 

Distinct 1213
Distinct (%) 9.8%
Missing 6988168
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:16.645503 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 6
Median length 6
Mean length 6
Min length 6

Characters and Unicode

Total characters 73932
Distinct characters 36
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 464 ?
Unique (%) 3.8%

Sample

1st row P00734
2nd row P00533
3rd row O75015
4th row P02745
5th row P02746
Value Count Frequency (%)
p08684 772
 
6.3%
p08183 402
 
3.3%
p10635 342
 
2.8%
p11712 333
 
2.7%
p33261 286
 
2.3%
p24462 286
 
2.3%
p05177 281
 
2.3%
p10632 260
 
2.1%
p20815 257
 
2.1%
p20813 190
 
1.5%
Other values (1203) 8913
72.3%
2025-04-28T21:09:17.173664 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
P 9374
12.7%
1 8038
10.9%
0 7326
9.9%
8 6850
9.3%
2 6577
8.9%
3 5880
8.0%
5 5320
7.2%
6 5267
7.1%
4 5105
6.9%
9 4092
5.5%
Other values (26) 10103
13.7%

Most occurring categories

Value Count Frequency (%)
(unknown) 73932
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 9374
12.7%
1 8038
10.9%
0 7326
9.9%
8 6850
9.3%
2 6577
8.9%
3 5880
8.0%
5 5320
7.2%
6 5267
7.1%
4 5105
6.9%
9 4092
5.5%
Other values (26) 10103
13.7%

Most occurring scripts

Value Count Frequency (%)
(unknown) 73932
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 9374
12.7%
1 8038
10.9%
0 7326
9.9%
8 6850
9.3%
2 6577
8.9%
3 5880
8.0%
5 5320
7.2%
6 5267
7.1%
4 5105
6.9%
9 4092
5.5%
Other values (26) 10103
13.7%

Most occurring blocks

Value Count Frequency (%)
(unknown) 73932
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 9374
12.7%
1 8038
10.9%
0 7326
9.9%
8 6850
9.3%
2 6577
8.9%
3 5880
8.0%
5 5320
7.2%
6 5267
7.1%
4 5105
6.9%
9 4092
5.5%
Other values (26) 10103
13.7%

entrez_id
Real number (ℝ)

Missing 

Distinct 1225
Distinct (%) 9.9%
Missing 6988168
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 1196684.6
Minimum 2
Maximum 1.0272456 × 108
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:17.570939 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 2
5-th percentile 155
Q1 1557
median 2554
Q3 6564
95-th percentile 54658
Maximum 1.0272456 × 108
Range 1.0272456 × 108
Interquartile range (IQR) 5007

Descriptive statistics

Standard deviation 10878987
Coefficient of variation (CV) 9.0909391
Kurtosis 80.024297
Mean 1196684.6
Median Absolute Deviation (MAD) 1707.5
Skewness 9.055944
Sum 1.4745547 × 1010
Variance 1.1835235 × 1014
Monotonicity Not monotonic
2025-04-28T21:09:17.754264 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1576 772
 
< 0.1%
5243 402
 
< 0.1%
1565 342
 
< 0.1%
1559 333
 
< 0.1%
1557 286
 
< 0.1%
1544 281
 
< 0.1%
1558 260
 
< 0.1%
1577 257
 
< 0.1%
1555 190
 
< 0.1%
100861540 143
 
< 0.1%
Other values (1215) 9056
 
0.1%
(Missing) 6988168
99.8%
Value Count Frequency (%)
2 2
 
< 0.1%
9 3
 
< 0.1%
10 11
< 0.1%
18 1
 
< 0.1%
19 4
 
< 0.1%
21 1
 
< 0.1%
25 6
< 0.1%
26 1
 
< 0.1%
27 1
 
< 0.1%
31 1
 
< 0.1%
Value Count Frequency (%)
102724560 1
 
< 0.1%
102724428 1
 
< 0.1%
100861540 143
< 0.1%
654364 4
 
< 0.1%
445329 18
 
< 0.1%
387775 4
 
< 0.1%
387129 1
 
< 0.1%
377677 1
 
< 0.1%
374291 1
 
< 0.1%
353189 7
 
< 0.1%

meddra_concept_name_4
Categorical

Missing 

Distinct 27
Distinct (%) 0.2%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
Investigations
1648 
Injury, poisoning and procedural complications
1240 
Nervous system disorders
1199 
Infections and infestations
1147 
Vascular disorders
 
967
Other values (22)
10737 

Length

Max length 67
Median length 40
Mean length 32.023852
Min length 13

Characters and Unicode

Total characters 542420
Distinct characters 38
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Blood and lymphatic system disorders
2nd row Blood and lymphatic system disorders
3rd row Blood and lymphatic system disorders
4th row Blood and lymphatic system disorders
5th row Blood and lymphatic system disorders

Common Values

Value Count Frequency (%)
Investigations 1648
 
< 0.1%
Injury, poisoning and procedural complications 1240
 
< 0.1%
Nervous system disorders 1199
 
< 0.1%
Infections and infestations 1147
 
< 0.1%
Vascular disorders 967
 
< 0.1%
Gastrointestinal disorders 948
 
< 0.1%
Skin and subcutaneous tissue disorders 799
 
< 0.1%
General disorders and administration site conditions 755
 
< 0.1%
Neoplasms benign, malignant and unspecified (incl cysts and polyps) 749
 
< 0.1%
Musculoskeletal and connective tissue disorders 721
 
< 0.1%
Other values (17) 6765
 
0.1%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:17.927727 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
disorders 10994
 
17.6%
and 10296
 
16.5%
system 2610
 
4.2%
investigations 1648
 
2.6%
tissue 1520
 
2.4%
procedural 1240
 
2.0%
poisoning 1240
 
2.0%
complications 1240
 
2.0%
injury 1240
 
2.0%
nervous 1199
 
1.9%
Other values (51) 29225
46.8%

Most occurring characters

Value Count Frequency (%)
s 57630
10.6%
i 49043
 
9.0%
45514
 
8.4%
n 43557
 
8.0%
e 41721
 
7.7%
d 39674
 
7.3%
r 39605
 
7.3%
a 36694
 
6.8%
o 35588
 
6.6%
t 29568
 
5.5%
Other values (28) 123826
22.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 542420
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
s 57630
10.6%
i 49043
 
9.0%
45514
 
8.4%
n 43557
 
8.0%
e 41721
 
7.7%
d 39674
 
7.3%
r 39605
 
7.3%
a 36694
 
6.8%
o 35588
 
6.6%
t 29568
 
5.5%
Other values (28) 123826
22.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 542420
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
s 57630
10.6%
i 49043
 
9.0%
45514
 
8.4%
n 43557
 
8.0%
e 41721
 
7.7%
d 39674
 
7.3%
r 39605
 
7.3%
a 36694
 
6.8%
o 35588
 
6.6%
t 29568
 
5.5%
Other values (28) 123826
22.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 542420
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
s 57630
10.6%
i 49043
 
9.0%
45514
 
8.4%
n 43557
 
8.0%
e 41721
 
7.7%
d 39674
 
7.3%
r 39605
 
7.3%
a 36694
 
6.8%
o 35588
 
6.6%
t 29568
 
5.5%
Other values (28) 123826
22.8%

neventreports
Real number (ℝ)

Missing 

Distinct 720
Distinct (%) 4.3%
Missing 6983549
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 74.279086
Minimum 1
Maximum 16798
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:18.083940 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
median 6
Q3 24
95-th percentile 295
Maximum 16798
Range 16797
Interquartile range (IQR) 22

Descriptive statistics

Standard deviation 388.00587
Coefficient of variation (CV) 5.223622
Kurtosis 450.99997
Mean 74.279086
Median Absolute Deviation (MAD) 5
Skewness 16.761267
Sum 1258362
Variance 150548.56
Monotonicity Not monotonic
2025-04-28T21:09:18.258509 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1 3789
 
0.1%
2 1713
 
< 0.1%
3 1275
 
< 0.1%
4 867
 
< 0.1%
5 689
 
< 0.1%
6 640
 
< 0.1%
7 502
 
< 0.1%
8 442
 
< 0.1%
9 378
 
< 0.1%
10 290
 
< 0.1%
Other values (710) 6356
 
0.1%
(Missing) 6983549
99.8%
Value Count Frequency (%)
1 3789
0.1%
2 1713
< 0.1%
3 1275
 
< 0.1%
4 867
 
< 0.1%
5 689
 
< 0.1%
6 640
 
< 0.1%
7 502
 
< 0.1%
8 442
 
< 0.1%
9 378
 
< 0.1%
10 290
 
< 0.1%
Value Count Frequency (%)
16798 1
< 0.1%
13250 1
< 0.1%
11598 1
< 0.1%
10425 1
< 0.1%
9281 1
< 0.1%
8969 1
< 0.1%
6233 1
< 0.1%
6040 1
< 0.1%
5973 1
< 0.1%
5780 1
< 0.1%

meddra_concept_class_id_1
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
PT
16938 

Length

Max length 2
Median length 2
Mean length 2
Min length 2

Characters and Unicode

Total characters 33876
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row PT
2nd row PT
3rd row PT
4th row PT
5th row PT

Common Values

Value Count Frequency (%)
PT 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:18.425183 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:18.547253 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
pt 16938
100.0%

Most occurring characters

Value Count Frequency (%)
P 16938
50.0%
T 16938
50.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 33876
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 16938
50.0%
T 16938
50.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 33876
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 16938
50.0%
T 16938
50.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 33876
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 16938
50.0%
T 16938
50.0%

meddra_concept_class_id_2
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
HLT
16938 

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 50814
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row HLT
2nd row HLT
3rd row HLT
4th row HLT
5th row HLT

Common Values

Value Count Frequency (%)
HLT 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:18.671961 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:18.792440 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
hlt 16938
100.0%

Most occurring characters

Value Count Frequency (%)
H 16938
33.3%
L 16938
33.3%
T 16938
33.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
H 16938
33.3%
L 16938
33.3%
T 16938
33.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
H 16938
33.3%
L 16938
33.3%
T 16938
33.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
H 16938
33.3%
L 16938
33.3%
T 16938
33.3%

meddra_concept_class_id_3
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
HLGT
16938 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 67752
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row HLGT
2nd row HLGT
3rd row HLGT
4th row HLGT
5th row HLGT

Common Values

Value Count Frequency (%)
HLGT 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:18.916077 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:19.041603 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
hlgt 16938
100.0%

Most occurring characters

Value Count Frequency (%)
H 16938
25.0%
L 16938
25.0%
G 16938
25.0%
T 16938
25.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
H 16938
25.0%
L 16938
25.0%
G 16938
25.0%
T 16938
25.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
H 16938
25.0%
L 16938
25.0%
G 16938
25.0%
T 16938
25.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
H 16938
25.0%
L 16938
25.0%
G 16938
25.0%
T 16938
25.0%

meddra_concept_class_id_4
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
SOC
16938 

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 50814
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row SOC
2nd row SOC
3rd row SOC
4th row SOC
5th row SOC

Common Values

Value Count Frequency (%)
SOC 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:19.165468 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:19.286915 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
soc 16938
100.0%

Most occurring characters

Value Count Frequency (%)
S 16938
33.3%
O 16938
33.3%
C 16938
33.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
S 16938
33.3%
O 16938
33.3%
C 16938
33.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
S 16938
33.3%
O 16938
33.3%
C 16938
33.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50814
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
S 16938
33.3%
O 16938
33.3%
C 16938
33.3%

meddra_concept_code_1
Real number (ℝ)

Missing 

Distinct 10767
Distinct (%) 63.6%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 10044712
Minimum 10000021
Maximum 10078675
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:19.432834 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 10000021
5-th percentile 10005334
Q1 10024387
median 10050214
Q3 10063422
95-th percentile 10074760
Maximum 10078675
Range 78654
Interquartile range (IQR) 39035

Descriptive statistics

Standard deviation 22515.539
Coefficient of variation (CV) 0.0022415316
Kurtosis -1.0872844
Mean 10044712
Median Absolute Deviation (MAD) 16172.5
Skewness -0.42468787
Sum 1.7013733 × 1011
Variance 5.069495 × 108
Monotonicity Not monotonic
2025-04-28T21:09:19.622657 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
10053869 7
 
< 0.1%
10034872 7
 
< 0.1%
10004213 7
 
< 0.1%
10051707 7
 
< 0.1%
10063935 7
 
< 0.1%
10044689 7
 
< 0.1%
10069116 6
 
< 0.1%
10072010 6
 
< 0.1%
10067010 6
 
< 0.1%
10050361 6
 
< 0.1%
Other values (10757) 16872
 
0.2%
(Missing) 6983552
99.8%
Value Count Frequency (%)
10000021 2
< 0.1%
10000028 1
< 0.1%
10000050 2
< 0.1%
10000059 1
< 0.1%
10000060 1
< 0.1%
10000077 1
< 0.1%
10000081 1
< 0.1%
10000084 1
< 0.1%
10000087 1
< 0.1%
10000090 1
< 0.1%
Value Count Frequency (%)
10078675 2
 
< 0.1%
10078668 2
 
< 0.1%
10078659 2
 
< 0.1%
10078651 2
 
< 0.1%
10078638 5
< 0.1%
10078602 1
 
< 0.1%
10078581 2
 
< 0.1%
10078580 1
 
< 0.1%
10078576 2
 
< 0.1%
10078575 1
 
< 0.1%

meddra_concept_code_2
Real number (ℝ)

Missing 

Distinct 1619
Distinct (%) 9.6%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 10029846
Minimum 10000032
Maximum 10077699
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:19.811154 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 10000032
5-th percentile 10003818
Q1 10016462
median 10027696
Q3 10040948
95-th percentile 10068755
Maximum 10077699
Range 77667
Interquartile range (IQR) 24486

Descriptive statistics

Standard deviation 18291.903
Coefficient of variation (CV) 0.0018237472
Kurtosis -0.16890637
Mean 10029846
Median Absolute Deviation (MAD) 13090
Skewness 0.57898279
Sum 1.6988553 × 1011
Variance 3.3459373 × 108
Monotonicity Not monotonic
2025-04-28T21:09:20.003847 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
10021544 146
 
< 0.1%
10022097 126
 
< 0.1%
10004047 106
 
< 0.1%
10003057 102
 
< 0.1%
10018987 101
 
< 0.1%
10068753 92
 
< 0.1%
10027700 89
 
< 0.1%
10027682 82
 
< 0.1%
10012424 82
 
< 0.1%
10018072 82
 
< 0.1%
Other values (1609) 15930
 
0.2%
(Missing) 6983552
99.8%
Value Count Frequency (%)
10000032 21
< 0.1%
10000063 2
 
< 0.1%
10000072 3
 
< 0.1%
10000117 7
 
< 0.1%
10000135 2
 
< 0.1%
10000171 9
< 0.1%
10000178 8
 
< 0.1%
10000190 6
 
< 0.1%
10000191 2
 
< 0.1%
10000192 11
< 0.1%
Value Count Frequency (%)
10077699 5
 
< 0.1%
10077550 6
 
< 0.1%
10077549 3
 
< 0.1%
10077548 30
< 0.1%
10077547 9
 
< 0.1%
10077545 2
 
< 0.1%
10077544 1
 
< 0.1%
10077542 1
 
< 0.1%
10077540 1
 
< 0.1%
10077538 2
 
< 0.1%

meddra_concept_code_3
Real number (ℝ)

Missing 

Distinct 337
Distinct (%) 2.0%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 10026499
Minimum 10000073
Maximum 10077546
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:20.191936 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 10000073
5-th percentile 10001708
Q1 10014623
median 10023213
Q3 10038612
95-th percentile 10065122
Maximum 10077546
Range 77473
Interquartile range (IQR) 23989

Descriptive statistics

Standard deviation 17669.616
Coefficient of variation (CV) 0.0017622917
Kurtosis 0.16329273
Mean 10026499
Median Absolute Deviation (MAD) 11908
Skewness 0.73123265
Sum 1.6982884 × 1011
Variance 3.1221532 × 108
Monotonicity Not monotonic
2025-04-28T21:09:20.384574 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
10001316 560
 
< 0.1%
10021879 382
 
< 0.1%
10014982 359
 
< 0.1%
10069888 359
 
< 0.1%
10004018 282
 
< 0.1%
10047075 274
 
< 0.1%
10029305 269
 
< 0.1%
10022114 264
 
< 0.1%
10018851 249
 
< 0.1%
10047438 225
 
< 0.1%
Other values (327) 13715
 
0.2%
(Missing) 6983552
99.8%
Value Count Frequency (%)
10000073 45
 
< 0.1%
10000211 20
 
< 0.1%
10000485 29
 
< 0.1%
10000546 64
 
< 0.1%
10001302 5
 
< 0.1%
10001316 560
< 0.1%
10001353 39
 
< 0.1%
10001474 8
 
< 0.1%
10001708 146
 
< 0.1%
10002086 40
 
< 0.1%
Value Count Frequency (%)
10077546 11
 
< 0.1%
10077537 59
 
< 0.1%
10076290 26
 
< 0.1%
10074469 4
 
< 0.1%
10072990 28
 
< 0.1%
10071947 105
 
< 0.1%
10071940 65
 
< 0.1%
10069888 359
< 0.1%
10069782 38
 
< 0.1%
10069781 59
 
< 0.1%

meddra_concept_code_4
Real number (ℝ)

Missing 

Distinct 27
Distinct (%) 0.2%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 10027122
Minimum 10005329
Maximum 10077536
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:20.547295 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 10005329
5-th percentile 10007541
Q1 10019805
median 10022891
Q3 10037175
95-th percentile 10047065
Maximum 10077536
Range 72207
Interquartile range (IQR) 17370

Descriptive statistics

Standard deviation 11351.636
Coefficient of variation (CV) 0.0011320932
Kurtosis 1.0165171
Mean 10027122
Median Absolute Deviation (MAD) 6314
Skewness 0.55127251
Sum 1.6983939 × 1011
Variance 1.2885964 × 108
Monotonicity Not monotonic
2025-04-28T21:09:20.703443 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
Value Count Frequency (%)
10022891 1648
 
< 0.1%
10022117 1240
 
< 0.1%
10029205 1199
 
< 0.1%
10021881 1147
 
< 0.1%
10047065 967
 
< 0.1%
10017947 948
 
< 0.1%
10040785 799
 
< 0.1%
10018065 755
 
< 0.1%
10029104 749
 
< 0.1%
10028395 721
 
< 0.1%
Other values (17) 6765
 
0.1%
(Missing) 6983552
99.8%
Value Count Frequency (%)
10005329 485
< 0.1%
10007541 393
< 0.1%
10010331 615
< 0.1%
10013993 98
 
< 0.1%
10014698 251
 
< 0.1%
10015919 514
< 0.1%
10017947 948
< 0.1%
10018065 755
< 0.1%
10019805 236
 
< 0.1%
10021428 450
< 0.1%
Value Count Frequency (%)
10077536 97
 
< 0.1%
10047065 967
< 0.1%
10042613 576
< 0.1%
10041244 122
 
< 0.1%
10040785 799
< 0.1%
10038738 672
< 0.1%
10038604 476
< 0.1%
10038359 407
< 0.1%
10037175 566
< 0.1%
10036585 365
 
< 0.1%

meddra_concept_id_2
Real number (ℝ)

Missing 

Distinct 1619
Distinct (%) 9.6%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 36430341
Minimum 788073
Maximum 45885357
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:20.876519 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 788073
5-th percentile 35202463
Q1 35802820
median 36303167
Q3 37003673
95-th percentile 37604043
Maximum 45885357
Range 45097284
Interquartile range (IQR) 1200853

Descriptive statistics

Standard deviation 3065518.4
Coefficient of variation (CV) 0.084147398
Kurtosis 97.877037
Mean 36430341
Median Absolute Deviation (MAD) 600384
Skewness -7.9705946
Sum 6.1705712 × 1011
Variance 9.3974032 × 1012
Monotonicity Not monotonic
2025-04-28T21:09:21.045275 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
35802818 146
 
< 0.1%
35802820 126
 
< 0.1%
36102898 106
 
< 0.1%
35802817 102
 
< 0.1%
37604043 101
 
< 0.1%
35802819 92
 
< 0.1%
37503992 89
 
< 0.1%
36002886 82
 
< 0.1%
37303796 82
 
< 0.1%
35802829 82
 
< 0.1%
Other values (1609) 15930
 
0.2%
(Missing) 6983552
99.8%
Value Count Frequency (%)
788073 23
< 0.1%
788074 4
 
< 0.1%
788075 5
 
< 0.1%
788076 2
 
< 0.1%
788078 1
 
< 0.1%
788080 1
 
< 0.1%
788082 1
 
< 0.1%
788083 2
 
< 0.1%
788084 9
 
< 0.1%
788085 30
< 0.1%
Value Count Frequency (%)
45885357 18
< 0.1%
45885356 3
 
< 0.1%
45885355 3
 
< 0.1%
45885354 2
 
< 0.1%
45885353 23
< 0.1%
45885352 23
< 0.1%
45885351 9
 
< 0.1%
45885349 13
< 0.1%
45885348 21
< 0.1%
45885347 20
< 0.1%

meddra_concept_id_3
Real number (ℝ)

Missing 

Distinct 337
Distinct (%) 2.0%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 36501597
Minimum 788071
Maximum 45885340
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:21.211506 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 788071
5-th percentile 35202053
Q1 35802128
median 36302182
Q3 37102294
95-th percentile 37602361
Maximum 45885340
Range 45097269
Interquartile range (IQR) 1300166

Descriptive statistics

Standard deviation 2739892.8
Coefficient of variation (CV) 0.075062272
Kurtosis 117.74614
Mean 36501597
Median Absolute Deviation (MAD) 600064
Skewness -8.6002069
Sum 6.1826405 × 1011
Variance 7.5070125 × 1012
Monotonicity Not monotonic
2025-04-28T21:09:21.385632 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
35802128 560
 
< 0.1%
36102149 382
 
< 0.1%
37302323 359
 
< 0.1%
42888893 359
 
< 0.1%
36102144 282
 
< 0.1%
37602361 274
 
< 0.1%
36702247 269
 
< 0.1%
36202157 264
 
< 0.1%
36302167 249
 
< 0.1%
36102154 225
 
< 0.1%
Other values (327) 13715
 
0.2%
(Missing) 6983552
99.8%
Value Count Frequency (%)
788071 59
< 0.1%
788072 11
 
< 0.1%
35102033 40
< 0.1%
35102034 55
< 0.1%
35102035 30
 
< 0.1%
35102036 56
< 0.1%
35102037 10
 
< 0.1%
35102038 34
 
< 0.1%
35102039 98
< 0.1%
35102040 12
 
< 0.1%
Value Count Frequency (%)
45885340 26
 
< 0.1%
45885339 4
 
< 0.1%
43053687 28
 
< 0.1%
42888894 65
 
< 0.1%
42888893 359
< 0.1%
42888892 105
 
< 0.1%
42888891 38
 
< 0.1%
42888890 59
 
< 0.1%
37602365 19
 
< 0.1%
37602364 7
 
< 0.1%

meddra_concept_id_4
Real number (ℝ)

Missing 

Distinct 27
Distinct (%) 0.2%
Missing 6983552
Missing (%) 99.8%
Infinite 0
Infinite (%) 0.0%
Mean 36200516
Minimum 788070
Maximum 37600000
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:21.725747 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 788070
5-th percentile 35200000
Q1 35900000
median 36300000
Q3 36900000
95-th percentile 37600000
Maximum 37600000
Range 36811930
Interquartile range (IQR) 1000000

Descriptive statistics

Standard deviation 2772093
Coefficient of variation (CV) 0.076576063
Kurtosis 149.58382
Mean 36200516
Median Absolute Deviation (MAD) 500000
Skewness -11.926897
Sum 6.1316434 × 1011
Variance 7.6844996 × 1012
Monotonicity Not monotonic
2025-04-28T21:09:21.879329 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
Value Count Frequency (%)
36300000 1648
 
< 0.1%
36200000 1240
 
< 0.1%
36700000 1199
 
< 0.1%
36100000 1147
 
< 0.1%
37600000 967
 
< 0.1%
35700000 948
 
< 0.1%
37300000 799
 
< 0.1%
35800000 755
 
< 0.1%
36600000 749
 
< 0.1%
36500000 721
 
< 0.1%
Other values (17) 6765
 
0.1%
(Missing) 6983552
99.8%
Value Count Frequency (%)
788070 97
 
< 0.1%
35100000 485
< 0.1%
35200000 393
< 0.1%
35300000 615
< 0.1%
35400000 98
 
< 0.1%
35500000 251
 
< 0.1%
35600000 514
< 0.1%
35700000 948
< 0.1%
35800000 755
< 0.1%
35900000 236
 
< 0.1%
Value Count Frequency (%)
37600000 967
< 0.1%
37500000 576
< 0.1%
37400000 122
 
< 0.1%
37300000 799
< 0.1%
37200000 672
< 0.1%
37100000 476
 
< 0.1%
37000000 407
 
< 0.1%
36900000 566
< 0.1%
36800000 365
 
< 0.1%
36700000 1199
< 0.1%

meddra_concept_name_1
Text

Missing 

Distinct 10768
Distinct (%) 63.6%
Missing 6983549
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:22.212186 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 93
Median length 57
Mean length 21.634496
Min length 3

Characters and Unicode

Total characters 366510
Distinct characters 70
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 5906 ?
Unique (%) 34.9%

Sample

1st row nan
2nd row nan
3rd row nan
4th row Gelatinous transformation of the bone marrow
5th row Eosinophilic granulomatosis with polyangiitis
Value Count Frequency (%)
site 891
 
2.1%
syndrome 711
 
1.7%
infection 508
 
1.2%
disorder 462
 
1.1%
increased 427
 
1.0%
abnormal 425
 
1.0%
of 384
 
0.9%
congenital 349
 
0.8%
decreased 325
 
0.8%
blood 307
 
0.7%
Other values (5573) 37447
88.7%
2025-04-28T21:09:22.735058 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
e 34008
 
9.3%
i 31298
 
8.5%
a 31002
 
8.5%
o 25615
 
7.0%
25295
 
6.9%
r 24750
 
6.8%
t 23930
 
6.5%
n 23155
 
6.3%
s 21736
 
5.9%
l 17347
 
4.7%
Other values (60) 108374
29.6%

Most occurring categories

Value Count Frequency (%)
(unknown) 366510
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 34008
 
9.3%
i 31298
 
8.5%
a 31002
 
8.5%
o 25615
 
7.0%
25295
 
6.9%
r 24750
 
6.8%
t 23930
 
6.5%
n 23155
 
6.3%
s 21736
 
5.9%
l 17347
 
4.7%
Other values (60) 108374
29.6%

Most occurring scripts

Value Count Frequency (%)
(unknown) 366510
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 34008
 
9.3%
i 31298
 
8.5%
a 31002
 
8.5%
o 25615
 
7.0%
25295
 
6.9%
r 24750
 
6.8%
t 23930
 
6.5%
n 23155
 
6.3%
s 21736
 
5.9%
l 17347
 
4.7%
Other values (60) 108374
29.6%

Most occurring blocks

Value Count Frequency (%)
(unknown) 366510
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 34008
 
9.3%
i 31298
 
8.5%
a 31002
 
8.5%
o 25615
 
7.0%
25295
 
6.9%
r 24750
 
6.8%
t 23930
 
6.5%
n 23155
 
6.3%
s 21736
 
5.9%
l 17347
 
4.7%
Other values (60) 108374
29.6%

meddra_concept_name_2
Text

Missing 

Distinct 1620
Distinct (%) 9.6%
Missing 6983549
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:23.072052 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 82
Median length 62
Mean length 32.316864
Min length 3

Characters and Unicode

Total characters 547480
Distinct characters 57
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 148 ?
Unique (%) 0.9%

Sample

1st row nan
2nd row nan
3rd row nan
4th row Marrow depression and hypoplastic anaemias
5th row Eosinophilic disorders
Value Count Frequency (%)
and 6447
 
9.7%
nec 5508
 
8.3%
disorders 3918
 
5.9%
infections 1947
 
2.9%
neoplasms 1018
 
1.5%
analyses 897
 
1.3%
procedures 842
 
1.3%
congenital 827
 
1.2%
tissue 751
 
1.1%
site 749
 
1.1%
Other values (1276) 43804
65.7%
2025-04-28T21:09:23.616009 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
49767
 
9.1%
e 45308
 
8.3%
i 44405
 
8.1%
s 44390
 
8.1%
a 43095
 
7.9%
n 39093
 
7.1%
o 34406
 
6.3%
r 33723
 
6.2%
t 31507
 
5.8%
c 23806
 
4.3%
Other values (47) 157980
28.9%

Most occurring categories

Value Count Frequency (%)
(unknown) 547480
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
49767
 
9.1%
e 45308
 
8.3%
i 44405
 
8.1%
s 44390
 
8.1%
a 43095
 
7.9%
n 39093
 
7.1%
o 34406
 
6.3%
r 33723
 
6.2%
t 31507
 
5.8%
c 23806
 
4.3%
Other values (47) 157980
28.9%

Most occurring scripts

Value Count Frequency (%)
(unknown) 547480
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
49767
 
9.1%
e 45308
 
8.3%
i 44405
 
8.1%
s 44390
 
8.1%
a 43095
 
7.9%
n 39093
 
7.1%
o 34406
 
6.3%
r 33723
 
6.2%
t 31507
 
5.8%
c 23806
 
4.3%
Other values (47) 157980
28.9%

Most occurring blocks

Value Count Frequency (%)
(unknown) 547480
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
49767
 
9.1%
e 45308
 
8.3%
i 44405
 
8.1%
s 44390
 
8.1%
a 43095
 
7.9%
n 39093
 
7.1%
o 34406
 
6.3%
r 33723
 
6.2%
t 31507
 
5.8%
c 23806
 
4.3%
Other values (47) 157980
28.9%

meddra_concept_name_3
Text

Missing 

Distinct 338
Distinct (%) 2.0%
Missing 6983549
Missing (%) 99.8%
Memory size 53.4 MiB
2025-04-28T21:09:23.941950 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 86
Median length 56
Mean length 35.634673
Min length 3

Characters and Unicode

Total characters 603687
Distinct characters 53
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 3 ?
Unique (%) < 0.1%

Sample

1st row nan
2nd row nan
3rd row nan
4th row Anaemias nonhaemolytic and marrow depression
5th row White blood cell disorders
Value Count Frequency (%)
and 7893
 
11.2%
disorders 7052
 
10.0%
nec 2836
 
4.0%
conditions 1529
 
2.2%
investigations 1417
 
2.0%
vascular 1299
 
1.8%
excl 1289
 
1.8%
infections 1185
 
1.7%
neoplasms 1019
 
1.4%
congenital 1001
 
1.4%
Other values (431) 43802
62.3%
2025-04-28T21:09:24.459325 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
53381
 
8.8%
i 52858
 
8.8%
e 51012
 
8.5%
s 50599
 
8.4%
n 44386
 
7.4%
a 43177
 
7.2%
r 41263
 
6.8%
o 39926
 
6.6%
t 35080
 
5.8%
d 32479
 
5.4%
Other values (43) 159526
26.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 603687
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
53381
 
8.8%
i 52858
 
8.8%
e 51012
 
8.5%
s 50599
 
8.4%
n 44386
 
7.4%
a 43177
 
7.2%
r 41263
 
6.8%
o 39926
 
6.6%
t 35080
 
5.8%
d 32479
 
5.4%
Other values (43) 159526
26.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 603687
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
53381
 
8.8%
i 52858
 
8.8%
e 51012
 
8.5%
s 50599
 
8.4%
n 44386
 
7.4%
a 43177
 
7.2%
r 41263
 
6.8%
o 39926
 
6.6%
t 35080
 
5.8%
d 32479
 
5.4%
Other values (43) 159526
26.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 603687
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
53381
 
8.8%
i 52858
 
8.8%
e 51012
 
8.5%
s 50599
 
8.4%
n 44386
 
7.4%
a 43177
 
7.2%
r 41263
 
6.8%
o 39926
 
6.6%
t 35080
 
5.8%
d 32479
 
5.4%
Other values (43) 159526
26.4%

relationship_id_12
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
Is a
16938 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 67752
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Is a
2nd row Is a
3rd row Is a
4th row Is a
5th row Is a

Common Values

Value Count Frequency (%)
Is a 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:24.628499 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:24.748451 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
is 16938
50.0%
a 16938
50.0%

Most occurring characters

Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

relationship_id_23
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
Is a
16938 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 67752
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Is a
2nd row Is a
3rd row Is a
4th row Is a
5th row Is a

Common Values

Value Count Frequency (%)
Is a 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:24.873955 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:24.994636 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
is 16938
50.0%
a 16938
50.0%

Most occurring characters

Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

relationship_id_34
Categorical

Constant  Missing 

Distinct 1
Distinct (%) < 0.1%
Missing 6983552
Missing (%) 99.8%
Memory size 53.4 MiB
Is a
16938 

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 67752
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Is a
2nd row Is a
3rd row Is a
4th row Is a
5th row Is a

Common Values

Value Count Frequency (%)
Is a 16938
 
0.2%
(Missing) 6983552
99.8%

Length

2025-04-28T21:09:25.120070 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:25.240643 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
is 16938
50.0%
a 16938
50.0%

Most occurring characters

Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 67752
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
I 16938
25.0%
s 16938
25.0%
16938
25.0%
a 16938
25.0%

soc_category
Categorical

Missing 

Distinct 8
Distinct (%) < 0.1%
Missing 6983649
Missing (%) 99.8%
Memory size 53.4 MiB
anatomic_site_disorder
9487 
procedural_disorder
1816 
biological_disorder
1806 
lab_test_disorder
1648 
infection_disorder
1147 
Other values (3)
 
937

Length

Max length 22
Median length 22
Mean length 20.412327
Min length 15

Characters and Unicode

Total characters 343764
Distinct characters 19
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row anatomic_site_disorder
2nd row anatomic_site_disorder
3rd row anatomic_site_disorder
4th row anatomic_site_disorder
5th row anatomic_site_disorder

Common Values

Value Count Frequency (%)
anatomic_site_disorder 9487
 
0.1%
procedural_disorder 1816
 
< 0.1%
biological_disorder 1806
 
< 0.1%
lab_test_disorder 1648
 
< 0.1%
infection_disorder 1147
 
< 0.1%
immune_system_disorder 450
 
< 0.1%
foreign_disorder 365
 
< 0.1%
social_disorder 122
 
< 0.1%
(Missing) 6983649
99.8%

Length

2025-04-28T21:09:25.378213 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:25.535353 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
anatomic_site_disorder 9487
56.3%
procedural_disorder 1816
 
10.8%
biological_disorder 1806
 
10.7%
lab_test_disorder 1648
 
9.8%
infection_disorder 1147
 
6.8%
immune_system_disorder 450
 
2.7%
foreign_disorder 365
 
2.2%
social_disorder 122
 
0.7%

Most occurring characters

Value Count Frequency (%)
i 42658
12.4%
r 37679
11.0%
d 35498
10.3%
o 33390
9.7%
e 32204
9.4%
s 28998
8.4%
_ 28426
8.3%
a 24366
7.1%
t 23867
6.9%
c 14378
 
4.2%
Other values (9) 42300
12.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 343764
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 42658
12.4%
r 37679
11.0%
d 35498
10.3%
o 33390
9.7%
e 32204
9.4%
s 28998
8.4%
_ 28426
8.3%
a 24366
7.1%
t 23867
6.9%
c 14378
 
4.2%
Other values (9) 42300
12.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 343764
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 42658
12.4%
r 37679
11.0%
d 35498
10.3%
o 33390
9.7%
e 32204
9.4%
s 28998
8.4%
_ 28426
8.3%
a 24366
7.1%
t 23867
6.9%
c 14378
 
4.2%
Other values (9) 42300
12.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 343764
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 42658
12.4%
r 37679
11.0%
d 35498
10.3%
o 33390
9.7%
e 32204
9.4%
s 28998
8.4%
_ 28426
8.3%
a 24366
7.1%
t 23867
6.9%
c 14378
 
4.2%
Other values (9) 42300
12.3%

pediatric_adverse_event
Categorical

Imbalance  Missing 

Distinct 2
Distinct (%) < 0.1%
Missing 6983549
Missing (%) 99.8%
Memory size 53.4 MiB
0.0
16220 
1.0
 
721

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 50823
Distinct characters 3
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 16220
 
0.2%
1.0 721
 
< 0.1%
(Missing) 6983549
99.8%

Length

2025-04-28T21:09:25.710647 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:25.832585 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 16220
95.7%
1.0 721
 
4.3%

Most occurring characters

Value Count Frequency (%)
0 33161
65.2%
. 16941
33.3%
1 721
 
1.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 50823
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 33161
65.2%
. 16941
33.3%
1 721
 
1.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50823
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 33161
65.2%
. 16941
33.3%
1 721
 
1.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50823
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 33161
65.2%
. 16941
33.3%
1 721
 
1.4%

probe
Text

Missing 

Distinct 959
Distinct (%) 0.5%
Missing 6805388
Missing (%) 97.2%
Memory size 53.4 MiB
2025-04-28T21:09:26.159114 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 12
Median length 9
Mean length 9.93671
Min length 6

Characters and Unicode

Total characters 1938672
Distinct characters 16
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1431_at
2nd row 1494_f_at
3rd row 177_at
4th row 200642_at
5th row 200697_at
Value Count Frequency (%)
215125_s_at 1782
 
0.9%
206094_x_at 1782
 
0.9%
208596_s_at 1782
 
0.9%
204532_x_at 1188
 
0.6%
207126_x_at 1188
 
0.6%
221304_at 990
 
0.5%
221305_s_at 990
 
0.5%
222094_at 792
 
0.4%
233334_x_at 792
 
0.4%
222203_s_at 396
 
0.2%
Other values (949) 183420
94.0%
2025-04-28T21:09:26.683792 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
2 289989
15.0%
_ 283176
14.6%
a 196965
10.2%
t 195102
10.1%
0 165456
8.5%
1 135324
7.0%
3 100530
 
5.2%
5 99099
 
5.1%
4 97083
 
5.0%
6 77616
 
4.0%
Other values (6) 298332
15.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 1938672
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
2 289989
15.0%
_ 283176
14.6%
a 196965
10.2%
t 195102
10.1%
0 165456
8.5%
1 135324
7.0%
3 100530
 
5.2%
5 99099
 
5.1%
4 97083
 
5.0%
6 77616
 
4.0%
Other values (6) 298332
15.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1938672
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
2 289989
15.0%
_ 283176
14.6%
a 196965
10.2%
t 195102
10.1%
0 165456
8.5%
1 135324
7.0%
3 100530
 
5.2%
5 99099
 
5.1%
4 97083
 
5.0%
6 77616
 
4.0%
Other values (6) 298332
15.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1938672
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
2 289989
15.0%
_ 283176
14.6%
a 196965
10.2%
t 195102
10.1%
0 165456
8.5%
1 135324
7.0%
3 100530
 
5.2%
5 99099
 
5.1%
4 97083
 
5.0%
6 77616
 
4.0%
Other values (6) 298332
15.4%

sample
Text

Missing 

Distinct 314
Distinct (%) 0.2%
Missing 6806436
Missing (%) 97.2%
Memory size 53.4 MiB
2025-04-28T21:09:27.051303 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 9
Median length 9
Mean length 7.6552609
Min length 4

Characters and Unicode

Total characters 1485534
Distinct characters 16
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row GSM228562
2nd row GSM228562
3rd row GSM228562
4th row GSM228562
5th row GSM228562
Value Count Frequency (%)
hyb2 1048
 
0.5%
hyb18 1048
 
0.5%
hyb19 1048
 
0.5%
hyb12 1048
 
0.5%
hyb11 1048
 
0.5%
hyb10 1048
 
0.5%
hyb1 1048
 
0.5%
hyb16 1048
 
0.5%
hyb15 1048
 
0.5%
hyb14 1048
 
0.5%
Other values (304) 183574
94.6%
2025-04-28T21:09:27.565206 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
2 286120
19.3%
8 135458
9.1%
M 131174
8.8%
S 131174
8.8%
G 131174
8.8%
6 112654
 
7.6%
5 93480
 
6.3%
4 66500
 
4.5%
H 62880
 
4.2%
y 62880
 
4.2%
Other values (6) 272040
18.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 1485534
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
2 286120
19.3%
8 135458
9.1%
M 131174
8.8%
S 131174
8.8%
G 131174
8.8%
6 112654
 
7.6%
5 93480
 
6.3%
4 66500
 
4.5%
H 62880
 
4.2%
y 62880
 
4.2%
Other values (6) 272040
18.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1485534
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
2 286120
19.3%
8 135458
9.1%
M 131174
8.8%
S 131174
8.8%
G 131174
8.8%
6 112654
 
7.6%
5 93480
 
6.3%
4 66500
 
4.5%
H 62880
 
4.2%
y 62880
 
4.2%
Other values (6) 272040
18.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1485534
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
2 286120
19.3%
8 135458
9.1%
M 131174
8.8%
S 131174
8.8%
G 131174
8.8%
6 112654
 
7.6%
5 93480
 
6.3%
4 66500
 
4.5%
H 62880
 
4.2%
y 62880
 
4.2%
Other values (6) 272040
18.3%

actual
Real number (ℝ)

Missing 

Distinct 76729
Distinct (%) 39.5%
Missing 6806436
Missing (%) 97.2%
Infinite 0
Infinite (%) 0.0%
Mean 138.36131
Minimum 5.2009343
Maximum 12232.902
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:27.756415 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 5.2009343
5-th percentile 10.606114
Q1 22.65203
median 46.402756
Q3 106.2966
95-th percentile 538.99159
Maximum 12232.902
Range 12227.701
Interquartile range (IQR) 83.64457

Descriptive statistics

Standard deviation 388.46238
Coefficient of variation (CV) 2.807594
Kurtosis 216.89838
Mean 138.36131
Median Absolute Deviation (MAD) 29.291114
Skewness 11.817311
Sum 26849565
Variance 150903.02
Monotonicity Not monotonic
2025-04-28T21:09:27.936966 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
16.16396997 70
 
< 0.1%
13.00912799 57
 
< 0.1%
17.66900607 55
 
< 0.1%
13.80442339 54
 
< 0.1%
22.58242099 54
 
< 0.1%
15.82444055 51
 
< 0.1%
15.8920675 50
 
< 0.1%
28.48592034 49
 
< 0.1%
15.72350509 49
 
< 0.1%
26.81545261 48
 
< 0.1%
Other values (76719) 193517
 
2.8%
(Missing) 6806436
97.2%
Value Count Frequency (%)
5.200934277 1
< 0.1%
5.262655185 2
< 0.1%
5.286312991 1
< 0.1%
5.287979996 1
< 0.1%
5.320035545 1
< 0.1%
5.348016531 1
< 0.1%
5.348665504 1
< 0.1%
5.354240401 1
< 0.1%
5.361693787 1
< 0.1%
5.381110847 1
< 0.1%
Value Count Frequency (%)
12232.90214 1
< 0.1%
12008.27225 1
< 0.1%
11986.26306 1
< 0.1%
11941.34582 1
< 0.1%
11850.44007 1
< 0.1%
11817.71133 1
< 0.1%
11812.5872 1
< 0.1%
11549.45386 1
< 0.1%
11482.28835 1
< 0.1%
11401.24467 1
< 0.1%

prediction
Real number (ℝ)

Missing 

Distinct 176989
Distinct (%) 91.2%
Missing 6806436
Missing (%) 97.2%
Infinite 0
Infinite (%) 0.0%
Mean 138.36131
Minimum -628.79359
Maximum 11524.11
Zeros 0
Zeros (%) 0.0%
Negative 701
Negative (%) < 0.1%
Memory size 53.4 MiB
2025-04-28T21:09:28.111516 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -628.79359
5-th percentile 10.784549
Q1 23.947807
median 47.498616
Q3 107.20521
95-th percentile 537.38797
Maximum 11524.11
Range 12152.903
Interquartile range (IQR) 83.257406

Descriptive statistics

Standard deviation 376.83315
Coefficient of variation (CV) 2.7235443
Kurtosis 209.68467
Mean 138.36131
Median Absolute Deviation (MAD) 29.48177
Skewness 11.579905
Sum 26849565
Variance 142003.22
Monotonicity Not monotonic
2025-04-28T21:09:28.288343 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
84.97984735 9
 
< 0.1%
50.65371351 9
 
< 0.1%
36.94538579 9
 
< 0.1%
87.20673994 9
 
< 0.1%
28.3649831 9
 
< 0.1%
94.72998205 9
 
< 0.1%
39.67502199 9
 
< 0.1%
28.82278141 9
 
< 0.1%
25.12841837 9
 
< 0.1%
51.14005072 9
 
< 0.1%
Other values (176979) 193964
 
2.8%
(Missing) 6806436
97.2%
Value Count Frequency (%)
-628.7935934 1
< 0.1%
-616.1722196 1
< 0.1%
-556.0704121 2
< 0.1%
-473.2518679 1
< 0.1%
-451.3253869 1
< 0.1%
-441.9756527 1
< 0.1%
-420.2652646 1
< 0.1%
-417.2540298 1
< 0.1%
-402.236643 1
< 0.1%
-388.1981059 1
< 0.1%
Value Count Frequency (%)
11524.1099 1
< 0.1%
11373.17708 1
< 0.1%
11047.70826 1
< 0.1%
10950.32993 1
< 0.1%
10928.08893 1
< 0.1%
10856.4793 1
< 0.1%
10775.93164 1
< 0.1%
10672.77182 1
< 0.1%
10669.51316 1
< 0.1%
10661.02157 1
< 0.1%

residual
Real number (ℝ)

Missing 

Distinct 176989
Distinct (%) 91.2%
Missing 6806436
Missing (%) 97.2%
Infinite 0
Infinite (%) 0.0%
Mean -3.8741192 × 10-14
Minimum -3566.9962
Maximum 6094.7571
Zeros 0
Zeros (%) 0.0%
Negative 106052
Negative (%) 1.5%
Memory size 53.4 MiB
2025-04-28T21:09:28.472630 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum -3566.9962
5-th percentile -62.34314
Q1 -8.9726341
median -0.78726069
Q3 6.984796
95-th percentile 59.324891
Maximum 6094.7571
Range 9661.7534
Interquartile range (IQR) 15.95743

Descriptive statistics

Standard deviation 94.338726
Coefficient of variation (CV) -2.4351013 × 1015
Kurtosis 285.53276
Mean -3.8741192 × 10-14
Median Absolute Deviation (MAD) 7.9763929
Skewness 4.1824022
Sum -7.6961442 × 10-9
Variance 8899.7953
Monotonicity Not monotonic
2025-04-28T21:09:28.647990 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
21.69867072 9
 
< 0.1%
-2.26204953 9
 
< 0.1%
-5.492569319 9
 
< 0.1%
-20.8373478 9
 
< 0.1%
-3.985364064 9
 
< 0.1%
10.20108708 9
 
< 0.1%
-13.70349191 9
 
< 0.1%
-11.83414315 9
 
< 0.1%
-1.357155355 9
 
< 0.1%
13.3048253 9
 
< 0.1%
Other values (176979) 193964
 
2.8%
(Missing) 6806436
97.2%
Value Count Frequency (%)
-3566.996238 1
< 0.1%
-3478.386128 1
< 0.1%
-3161.473045 1
< 0.1%
-2579.662309 1
< 0.1%
-2219.270274 1
< 0.1%
-1972.140323 1
< 0.1%
-1971.806054 1
< 0.1%
-1969.435409 1
< 0.1%
-1968.602113 1
< 0.1%
-1956.043967 1
< 0.1%
Value Count Frequency (%)
6094.757115 1
< 0.1%
3900.076232 1
< 0.1%
3535.515059 1
< 0.1%
3440.438665 1
< 0.1%
2853.126021 1
< 0.1%
2741.21263 1
< 0.1%
2648.913866 1
< 0.1%
2606.186316 1
< 0.1%
2560.685639 2
< 0.1%
2492.359834 1
< 0.1%

f_statistic
Real number (ℝ)

Missing 

Distinct 959
Distinct (%) 0.5%
Missing 6806436
Missing (%) 97.2%
Infinite 0
Infinite (%) 0.0%
Mean 25.312626
Minimum 7.9357806 × 10-5
Maximum 118.115
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:28.817703 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 7.9357806 × 10-5
5-th percentile 0.13191008
Q1 4.3826629
median 16.375423
Q3 39.172581
95-th percentile 79.001213
Maximum 118.115
Range 118.11492
Interquartile range (IQR) 34.789918

Descriptive statistics

Standard deviation 25.844862
Coefficient of variation (CV) 1.0210265
Kurtosis 0.55344799
Mean 25.312626
Median Absolute Deviation (MAD) 14.593443
Skewness 1.145873
Sum 4912016.3
Variance 667.95688
Monotonicity Not monotonic
2025-04-28T21:09:29.193738 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
34.86906227 1773
 
< 0.1%
0.0276985382 1773
 
< 0.1%
17.6066331 1773
 
< 0.1%
24.06743353 1182
 
< 0.1%
0.1717596018 1182
 
< 0.1%
60.94618329 985
 
< 0.1%
31.91198938 985
 
< 0.1%
6.532261215 788
 
< 0.1%
4.745979548 788
 
< 0.1%
69.61018591 394
 
< 0.1%
Other values (949) 182431
 
2.6%
(Missing) 6806436
97.2%
Value Count Frequency (%)
7.935780645 × 10-5 197
< 0.1%
0.0001477381937 197
< 0.1%
0.0003700914575 394
< 0.1%
0.0004637703998 197
< 0.1%
0.0009930464802 197
< 0.1%
0.001326756269 197
< 0.1%
0.001993672622 197
< 0.1%
0.002616872536 80
 
< 0.1%
0.002975347702 197
< 0.1%
0.003842236844 80
 
< 0.1%
Value Count Frequency (%)
118.115003 197
< 0.1%
113.5721515 197
< 0.1%
110.9068887 197
< 0.1%
108.1158634 197
< 0.1%
107.4835001 197
< 0.1%
106.5162376 197
< 0.1%
105.5249249 197
< 0.1%
103.9309277 197
< 0.1%
101.6288633 197
< 0.1%
100.2938413 197
< 0.1%

f_pvalue
Real number (ℝ)

Missing 

Distinct 959
Distinct (%) 0.5%
Missing 6806436
Missing (%) 97.2%
Infinite 0
Infinite (%) 0.0%
Mean 0.10278914
Minimum 8.1610273 × 10-22
Maximum 0.99290141
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:29.378014 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 8.1610273 × 10-22
5-th percentile 4.1557564 × 10-16
Q1 2.4148636 × 10-9
median 7.4851824 × 10-5
Q3 0.039035239
95-th percentile 0.71685318
Maximum 0.99290141
Range 0.99290141
Interquartile range (IQR) 0.039035237

Descriptive statistics

Standard deviation 0.22758649
Coefficient of variation (CV) 2.2141104
Kurtosis 4.7567022
Mean 0.10278914
Median Absolute Deviation (MAD) 7.4851824 × 10-5
Skewness 2.4104604
Sum 19946.643
Variance 0.051795612
Monotonicity Not monotonic
2025-04-28T21:09:29.556403 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
1.542928473 × 10-8 1773
 
< 0.1%
0.8679919686 1773
 
< 0.1%
4.123354768 × 10-5 1773
 
< 0.1%
1.957248679 × 10-6 1182
 
< 0.1%
0.6790082437 1182
 
< 0.1%
3.507978814 × 10-13 985
 
< 0.1%
5.649688463 × 10-8 985
 
< 0.1%
0.01135545391 788
 
< 0.1%
0.03056621829 788
 
< 0.1%
1.301954391 × 10-14 394
 
< 0.1%
Other values (949) 182431
 
2.6%
(Missing) 6806436
97.2%
Value Count Frequency (%)
8.161027291 × 10-22 197
< 0.1%
3.433958916 × 10-21 197
< 0.1%
8.059521796 × 10-21 197
< 0.1%
1.985479129 × 10-20 197
< 0.1%
2.438335113 × 10-20 197
< 0.1%
3.341543539 × 10-20 197
< 0.1%
4.620312974 × 10-20 197
< 0.1%
7.797449455 × 10-20 197
< 0.1%
1.668779009 × 10-19 197
< 0.1%
2.6016043 × 10-19 197
< 0.1%
Value Count Frequency (%)
0.9929014061 197
< 0.1%
0.9903145697 197
< 0.1%
0.9846711039 394
< 0.1%
0.9828406414 197
< 0.1%
0.9748929422 197
< 0.1%
0.9709810073 197
< 0.1%
0.964431531 197
< 0.1%
0.9593324877 80
 
< 0.1%
0.9565554087 197
< 0.1%
0.9507327187 80
 
< 0.1%

ATC_concept_class_id
Categorical

Constant  Missing 

Distinct 1
Distinct (%) 0.2%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
ATC 5th
576 

Length

Max length 7
Median length 7
Mean length 7
Min length 7

Characters and Unicode

Total characters 4032
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row ATC 5th
2nd row ATC 5th
3rd row ATC 5th
4th row ATC 5th
5th row ATC 5th

Common Values

Value Count Frequency (%)
ATC 5th 576
 
< 0.1%
(Missing) 6999914
> 99.9%

Length

2025-04-28T21:09:29.713466 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:29.822599 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
atc 576
50.0%
5th 576
50.0%

Most occurring characters

Value Count Frequency (%)
A 576
14.3%
T 576
14.3%
C 576
14.3%
576
14.3%
5 576
14.3%
t 576
14.3%
h 576
14.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 4032
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
A 576
14.3%
T 576
14.3%
C 576
14.3%
576
14.3%
5 576
14.3%
t 576
14.3%
h 576
14.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 4032
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
A 576
14.3%
T 576
14.3%
C 576
14.3%
576
14.3%
5 576
14.3%
t 576
14.3%
h 576
14.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 4032
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
A 576
14.3%
T 576
14.3%
C 576
14.3%
576
14.3%
5 576
14.3%
t 576
14.3%
h 576
14.3%

ATC_concept_id
Categorical

Missing  Uniform 

Distinct 8
Distinct (%) 1.4%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
21602320.0
72 
21602800.0
72 
21602977.0
72 
21603103.0
72 
21603356.0
72 
Other values (3)
216 

Length

Max length 10
Median length 10
Mean length 10
Min length 10

Characters and Unicode

Total characters 5760
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 21602320.0
2nd row 21602320.0
3rd row 21602320.0
4th row 21602320.0
5th row 21602320.0

Common Values

Value Count Frequency (%)
21602320.0 72
 
< 0.1%
21602800.0 72
 
< 0.1%
21602977.0 72
 
< 0.1%
21603103.0 72
 
< 0.1%
21603356.0 72
 
< 0.1%
21603967.0 72
 
< 0.1%
21604757.0 72
 
< 0.1%
21604941.0 72
 
< 0.1%
(Missing) 6999914
> 99.9%

Length

2025-04-28T21:09:29.937649 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:30.078857 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
21602320.0 72
12.5%
21602800.0 72
12.5%
21602977.0 72
12.5%
21603103.0 72
12.5%
21603356.0 72
12.5%
21603967.0 72
12.5%
21604757.0 72
12.5%
21604941.0 72
12.5%

Most occurring characters

Value Count Frequency (%)
0 1440
25.0%
2 864
15.0%
1 720
12.5%
6 720
12.5%
. 576
 
10.0%
3 432
 
7.5%
7 360
 
6.2%
4 216
 
3.8%
9 216
 
3.8%
5 144
 
2.5%

Most occurring categories

Value Count Frequency (%)
(unknown) 5760
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 1440
25.0%
2 864
15.0%
1 720
12.5%
6 720
12.5%
. 576
 
10.0%
3 432
 
7.5%
7 360
 
6.2%
4 216
 
3.8%
9 216
 
3.8%
5 144
 
2.5%

Most occurring scripts

Value Count Frequency (%)
(unknown) 5760
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 1440
25.0%
2 864
15.0%
1 720
12.5%
6 720
12.5%
. 576
 
10.0%
3 432
 
7.5%
7 360
 
6.2%
4 216
 
3.8%
9 216
 
3.8%
5 144
 
2.5%

Most occurring blocks

Value Count Frequency (%)
(unknown) 5760
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 1440
25.0%
2 864
15.0%
1 720
12.5%
6 720
12.5%
. 576
 
10.0%
3 432
 
7.5%
7 360
 
6.2%
4 216
 
3.8%
9 216
 
3.8%
5 144
 
2.5%

ATC_concept_name
Categorical

Missing  Uniform 

Distinct 8
Distinct (%) 1.4%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
isotretinoin
72 
doxycycline
72 
clarithromycin
72 
isoniazid
72 
montelukast
72 
Other values (3)
216 

Length

Max length 15
Median length 14
Mean length 11.5
Min length 9

Characters and Unicode

Total characters 6624
Distinct characters 21
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row isotretinoin
2nd row isotretinoin
3rd row isotretinoin
4th row isotretinoin
5th row isotretinoin

Common Values

Value Count Frequency (%)
isotretinoin 72
 
< 0.1%
doxycycline 72
 
< 0.1%
clarithromycin 72
 
< 0.1%
isoniazid 72
 
< 0.1%
montelukast 72
 
< 0.1%
ibuprofen 72
 
< 0.1%
methylphenidate 72
 
< 0.1%
mebendazole 72
 
< 0.1%
(Missing) 6999914
> 99.9%

Length

2025-04-28T21:09:30.250945 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:30.402890 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
isotretinoin 72
12.5%
doxycycline 72
12.5%
clarithromycin 72
12.5%
isoniazid 72
12.5%
montelukast 72
12.5%
ibuprofen 72
12.5%
methylphenidate 72
12.5%
mebendazole 72
12.5%

Most occurring characters

Value Count Frequency (%)
i 792
12.0%
e 720
 
10.9%
n 648
 
9.8%
o 576
 
8.7%
t 504
 
7.6%
l 360
 
5.4%
a 360
 
5.4%
c 288
 
4.3%
r 288
 
4.3%
y 288
 
4.3%
Other values (11) 1800
27.2%

Most occurring categories

Value Count Frequency (%)
(unknown) 6624
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
i 792
12.0%
e 720
 
10.9%
n 648
 
9.8%
o 576
 
8.7%
t 504
 
7.6%
l 360
 
5.4%
a 360
 
5.4%
c 288
 
4.3%
r 288
 
4.3%
y 288
 
4.3%
Other values (11) 1800
27.2%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6624
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
i 792
12.0%
e 720
 
10.9%
n 648
 
9.8%
o 576
 
8.7%
t 504
 
7.6%
l 360
 
5.4%
a 360
 
5.4%
c 288
 
4.3%
r 288
 
4.3%
y 288
 
4.3%
Other values (11) 1800
27.2%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6624
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
i 792
12.0%
e 720
 
10.9%
n 648
 
9.8%
o 576
 
8.7%
t 504
 
7.6%
l 360
 
5.4%
a 360
 
5.4%
c 288
 
4.3%
r 288
 
4.3%
y 288
 
4.3%
Other values (11) 1800
27.2%

Control
Categorical

Missing 

Distinct 2
Distinct (%) 0.3%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
N
397 
P
179 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 576
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row N
2nd row N
3rd row N
4th row N
5th row N

Common Values

Value Count Frequency (%)
N 397
 
< 0.1%
P 179
 
< 0.1%
(Missing) 6999914
> 99.9%

Length

2025-04-28T21:09:30.570273 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:30.682620 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
n 397
68.9%
p 179
31.1%

Most occurring characters

Value Count Frequency (%)
N 397
68.9%
P 179
31.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 576
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
N 397
68.9%
P 179
31.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 576
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
N 397
68.9%
P 179
31.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 576
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
N 397
68.9%
P 179
31.1%

MedDRA_concept_class_id
Categorical

Constant  Missing 

Distinct 1
Distinct (%) 0.2%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
PT
576 

Length

Max length 2
Median length 2
Mean length 2
Min length 2

Characters and Unicode

Total characters 1152
Distinct characters 2
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row PT
2nd row PT
3rd row PT
4th row PT
5th row PT

Common Values

Value Count Frequency (%)
PT 576
 
< 0.1%
(Missing) 6999914
> 99.9%

Length

2025-04-28T21:09:30.805775 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:30.914861 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
pt 576
100.0%

Most occurring characters

Value Count Frequency (%)
P 576
50.0%
T 576
50.0%

Most occurring categories

Value Count Frequency (%)
(unknown) 1152
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
P 576
50.0%
T 576
50.0%

Most occurring scripts

Value Count Frequency (%)
(unknown) 1152
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
P 576
50.0%
T 576
50.0%

Most occurring blocks

Value Count Frequency (%)
(unknown) 1152
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
P 576
50.0%
T 576
50.0%

MedDRA_concept_id
Real number (ℝ)

Missing 

Distinct 72
Distinct (%) 12.5%
Missing 6999914
Missing (%) > 99.9%
Infinite 0
Infinite (%) 0.0%
Mean 36769521
Minimum 35104182
Maximum 42890510
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 53.4 MiB
2025-04-28T21:09:31.041191 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 35104182
5-th percentile 35104688
Q1 35909627
median 36315584
Q3 36919090
95-th percentile 42889492
Maximum 42890510
Range 7786328
Interquartile range (IQR) 1009463.2

Descriptive statistics

Standard deviation 1775525.5
Coefficient of variation (CV) 0.04828797
Kurtosis 7.0115245
Mean 36769521
Median Absolute Deviation (MAD) 603389
Skewness 2.734111
Sum 2.1179244 × 1010
Variance 3.1524909 × 1012
Monotonicity Not monotonic
2025-04-28T21:09:31.226175 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
36315573 8
 
< 0.1%
36315575 8
 
< 0.1%
36315595 8
 
< 0.1%
36315611 8
 
< 0.1%
37019353 8
 
< 0.1%
35506541 8
 
< 0.1%
36315599 8
 
< 0.1%
36315594 8
 
< 0.1%
36313522 8
 
< 0.1%
36313524 8
 
< 0.1%
Other values (62) 496
 
< 0.1%
(Missing) 6999914
> 99.9%
Value Count Frequency (%)
35104182 8
< 0.1%
35104678 8
< 0.1%
35104679 8
< 0.1%
35104688 8
< 0.1%
35104690 8
< 0.1%
35104691 8
< 0.1%
35104692 8
< 0.1%
35104693 8
< 0.1%
35506541 8
< 0.1%
35607134 8
< 0.1%
Value Count Frequency (%)
42890510 8
< 0.1%
42889748 8
< 0.1%
42889495 8
< 0.1%
42889492 8
< 0.1%
42888927 8
< 0.1%
37320187 8
< 0.1%
37019353 8
< 0.1%
36919236 8
< 0.1%
36919230 8
< 0.1%
36919154 8
< 0.1%

MedDRA_concept_name
Text

Missing 

Distinct 72
Distinct (%) 12.5%
Missing 6999914
Missing (%) > 99.9%
Memory size 53.4 MiB
2025-04-28T21:09:31.516677 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 46
Median length 30
Mean length 22.611111
Min length 8

Characters and Unicode

Total characters 13024
Distinct characters 45
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Blood creatinine abnormal
2nd row Blood creatinine increased
3rd row Creatinine renal clearance decreased
4th row Inulin renal clearance decreased
5th row Renal cortical necrosis
Value Count Frequency (%)
psychosis 72
 
5.0%
delusion 72
 
5.0%
abnormal 48
 
3.3%
disorder 48
 
3.3%
decreased 40
 
2.8%
injury 40
 
2.8%
thrombocytopenia 40
 
2.8%
platelet 40
 
2.8%
liver 40
 
2.8%
creatinine 40
 
2.8%
Other values (82) 960
66.7%
2025-04-28T21:09:31.978241 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
e 1384
 
10.6%
i 1080
 
8.3%
a 976
 
7.5%
o 952
 
7.3%
r 896
 
6.9%
n 864
 
6.6%
864
 
6.6%
s 792
 
6.1%
c 696
 
5.3%
t 680
 
5.2%
Other values (35) 3840
29.5%

Most occurring categories

Value Count Frequency (%)
(unknown) 13024
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 1384
 
10.6%
i 1080
 
8.3%
a 976
 
7.5%
o 952
 
7.3%
r 896
 
6.9%
n 864
 
6.6%
864
 
6.6%
s 792
 
6.1%
c 696
 
5.3%
t 680
 
5.2%
Other values (35) 3840
29.5%

Most occurring scripts

Value Count Frequency (%)
(unknown) 13024
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 1384
 
10.6%
i 1080
 
8.3%
a 976
 
7.5%
o 952
 
7.3%
r 896
 
6.9%
n 864
 
6.6%
864
 
6.6%
s 792
 
6.1%
c 696
 
5.3%
t 680
 
5.2%
Other values (35) 3840
29.5%

Most occurring blocks

Value Count Frequency (%)
(unknown) 13024
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 1384
 
10.6%
i 1080
 
8.3%
a 976
 
7.5%
o 952
 
7.3%
r 896
 
6.9%
n 864
 
6.6%
864
 
6.6%
s 792
 
6.1%
c 696
 
5.3%
t 680
 
5.2%
Other values (35) 3840
29.5%

condition_name
Categorical

Missing 

Distinct 4
Distinct (%) 0.2%
Missing 6998141
Missing (%) > 99.9%
Memory size 53.4 MiB
Acute liver injury
1078 
GI bleed
928 
Acute myocardial infarction
284 
Acute kidney injury
 
59

Length

Max length 27
Median length 19
Mean length 15.162622
Min length 8

Characters and Unicode

Total characters 35617
Distinct characters 22
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Acute liver injury
2nd row Acute liver injury
3rd row Acute liver injury
4th row Acute liver injury
5th row Acute liver injury

Common Values

Value Count Frequency (%)
Acute liver injury 1078
 
< 0.1%
GI bleed 928
 
< 0.1%
Acute myocardial infarction 284
 
< 0.1%
Acute kidney injury 59
 
< 0.1%
(Missing) 6998141
> 99.9%

Length

2025-04-28T21:09:32.162641 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:32.297449 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
acute 1421
23.2%
injury 1137
18.6%
liver 1078
17.6%
gi 928
15.2%
bleed 928
15.2%
myocardial 284
 
4.6%
infarction 284
 
4.6%
kidney 59
 
1.0%

Most occurring characters

Value Count Frequency (%)
e 4414
12.4%
3770
 
10.6%
i 3126
 
8.8%
r 2783
 
7.8%
u 2558
 
7.2%
l 2290
 
6.4%
c 1989
 
5.6%
n 1764
 
5.0%
t 1705
 
4.8%
y 1480
 
4.2%
Other values (12) 9738
27.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 35617
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 4414
12.4%
3770
 
10.6%
i 3126
 
8.8%
r 2783
 
7.8%
u 2558
 
7.2%
l 2290
 
6.4%
c 1989
 
5.6%
n 1764
 
5.0%
t 1705
 
4.8%
y 1480
 
4.2%
Other values (12) 9738
27.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 35617
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 4414
12.4%
3770
 
10.6%
i 3126
 
8.8%
r 2783
 
7.8%
u 2558
 
7.2%
l 2290
 
6.4%
c 1989
 
5.6%
n 1764
 
5.0%
t 1705
 
4.8%
y 1480
 
4.2%
Other values (12) 9738
27.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 35617
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 4414
12.4%
3770
 
10.6%
i 3126
 
8.8%
r 2783
 
7.8%
u 2558
 
7.2%
l 2290
 
6.4%
c 1989
 
5.6%
n 1764
 
5.0%
t 1705
 
4.8%
y 1480
 
4.2%
Other values (12) 9738
27.3%

control
Categorical

Missing 

Distinct 2
Distinct (%) 0.1%
Missing 6998141
Missing (%) > 99.9%
Memory size 53.4 MiB
negative
1256 
positive
1093 

Length

Max length 8
Median length 8
Mean length 8
Min length 8

Characters and Unicode

Total characters 18792
Distinct characters 10
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row negative
2nd row negative
3rd row negative
4th row negative
5th row negative

Common Values

Value Count Frequency (%)
negative 1256
 
< 0.1%
positive 1093
 
< 0.1%
(Missing) 6998141
> 99.9%

Length

2025-04-28T21:09:32.429236 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T21:09:32.539790 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
negative 1256
53.5%
positive 1093
46.5%

Most occurring characters

Value Count Frequency (%)
e 3605
19.2%
i 3442
18.3%
v 2349
12.5%
t 2349
12.5%
n 1256
 
6.7%
a 1256
 
6.7%
g 1256
 
6.7%
p 1093
 
5.8%
o 1093
 
5.8%
s 1093
 
5.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 18792
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 3605
19.2%
i 3442
18.3%
v 2349
12.5%
t 2349
12.5%
n 1256
 
6.7%
a 1256
 
6.7%
g 1256
 
6.7%
p 1093
 
5.8%
o 1093
 
5.8%
s 1093
 
5.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 18792
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 3605
19.2%
i 3442
18.3%
v 2349
12.5%
t 2349
12.5%
n 1256
 
6.7%
a 1256
 
6.7%
g 1256
 
6.7%
p 1093
 
5.8%
o 1093
 
5.8%
s 1093
 
5.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 18792
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 3605
19.2%
i 3442
18.3%
v 2349
12.5%
t 2349
12.5%
n 1256
 
6.7%
a 1256
 
6.7%
g 1256
 
6.7%
p 1093
 
5.8%
o 1093
 
5.8%
s 1093
 
5.8%

stitch_id
Text

Missing 

Distinct 707
Distinct (%) 1.1%
Missing 6933462
Missing (%) 99.0%
Memory size 53.4 MiB
2025-04-28T21:09:32.762724 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 12
Median length 12
Mean length 12
Min length 12

Characters and Unicode

Total characters 804336
Distinct characters 13
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 17 ?
Unique (%) < 0.1%

Sample

1st row CID100002713
2nd row CID100002713
3rd row CID100002713
4th row CID100002713
5th row CID100002713
Value Count Frequency (%)
cid100002771 1575
 
2.3%
cid100003032 1011
 
1.5%
cid100060795 937
 
1.4%
cid100005073 787
 
1.2%
cid100003345 782
 
1.2%
cid100004158 706
 
1.1%
cid100004594 634
 
0.9%
cid100004583 625
 
0.9%
cid100005514 576
 
0.9%
cid100003386 566
 
0.8%
Other values (697) 58829
87.8%
2025-04-28T21:09:33.167796 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 804336
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 266797
33.2%
1 98350
 
12.2%
C 67028
 
8.3%
D 67028
 
8.3%
I 67028
 
8.3%
3 37628
 
4.7%
2 35844
 
4.5%
4 34754
 
4.3%
5 34182
 
4.2%
6 29021
 
3.6%
Other values (3) 66676
 
8.3%

medgen_id
Text

Missing 

Distinct 2752
Distinct (%) 4.1%
Missing 6933462
Missing (%) 99.0%
Memory size 53.4 MiB
2025-04-28T21:09:33.514164 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 8
Median length 8
Mean length 8
Min length 8

Characters and Unicode

Total characters 536224
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 519 ?
Unique (%) 0.8%

Sample

1st row C0017565
2nd row C0017565
3rd row C0030193
4th row C0234238
5th row C0020517
Value Count Frequency (%)
c0039070 1581
 
2.4%
c0012833 1149
 
1.7%
c0038325 690
 
1.0%
c0013404 682
 
1.0%
c0231218 675
 
1.0%
c0015672 657
 
1.0%
c0008031 642
 
1.0%
c0009676 642
 
1.0%
c0042109 618
 
0.9%
c0015230 577
 
0.9%
Other values (2742) 59115
88.2%
2025-04-28T21:09:33.996669 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 536224
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 152290
28.4%
C 67028
12.5%
2 48727
 
9.1%
1 47079
 
8.8%
3 46276
 
8.6%
4 33569
 
6.3%
5 30356
 
5.7%
8 28433
 
5.3%
9 28237
 
5.3%
7 27465
 
5.1%

Interactions

Missing values

2025-04-28T20:54:57.634540 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-28T20:57:59.349215 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-04-28T21:07:07.602992 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

ade atc_concept_id meddra_concept_id cluster_id gt_null_statistic gt_null_99 max_score_nichd cluster_name ade_nreports table_name nichd gam_score norm gam_score_se gam_score_90mse gam_score_90pse D E DE ade_name category atc_concept_name meddra_concept_name atc_concept_class_id meddra_concept_class_id a b c d lwr odds_ratio upr pvalue fdr null_99 safetyreportid sex reporter_qualification receive_date XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy atc1_concept_name raw_code gene_symbol type soc auroc wt_pvalue ttest_statistic ttest_pvalue atc_concept_code ndrugreports atc4_concept_name atc4_concept_code atc3_concept_name atc3_concept_code atc2_concept_name atc2_concept_code atc1_concept_code drugbank_id id action uniprot_id entrez_id meddra_concept_name_4 neventreports meddra_concept_class_id_1 meddra_concept_class_id_2 meddra_concept_class_id_3 meddra_concept_class_id_4 meddra_concept_code_1 meddra_concept_code_2 meddra_concept_code_3 meddra_concept_code_4 meddra_concept_id_2 meddra_concept_id_3 meddra_concept_id_4 meddra_concept_name_1 meddra_concept_name_2 meddra_concept_name_3 relationship_id_12 relationship_id_23 relationship_id_34 soc_category pediatric_adverse_event probe sample actual prediction residual f_statistic f_pvalue ATC_concept_class_id ATC_concept_id ATC_concept_name Control MedDRA_concept_class_id MedDRA_concept_id MedDRA_concept_name condition_name control stitch_id medgen_id
0 1588648_35809076 1588648.0 35809076.0 2.0 1.0 0.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1588648_36315755 1588648.0 36315755.0 2.0 1.0 1.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 1588648_36416514 1588648.0 36416514.0 2.0 1.0 0.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 1588648_37019318 1588648.0 37019318.0 2.0 1.0 0.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1588648_37019399 1588648.0 37019399.0 2.0 1.0 1.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 1588648_37522220 1588648.0 37522220.0 2.0 1.0 0.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 1588697_35104746 1588697.0 35104746.0 4.0 0.0 0.0 term_neonatal Decrease 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 1588697_35104812 1588697.0 35104812.0 2.0 0.0 0.0 late_adolescence Increase 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 1588697_35104824 1588697.0 35104824.0 2.0 0.0 0.0 late_adolescence Increase 3.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 1588697_35104834 1588697.0 35104834.0 4.0 0.0 0.0 term_neonatal Decrease 1.0 ade NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
ade atc_concept_id meddra_concept_id cluster_id gt_null_statistic gt_null_99 max_score_nichd cluster_name ade_nreports table_name nichd gam_score norm gam_score_se gam_score_90mse gam_score_90pse D E DE ade_name category atc_concept_name meddra_concept_name atc_concept_class_id meddra_concept_class_id a b c d lwr odds_ratio upr pvalue fdr null_99 safetyreportid sex reporter_qualification receive_date XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy atc1_concept_name raw_code gene_symbol type soc auroc wt_pvalue ttest_statistic ttest_pvalue atc_concept_code ndrugreports atc4_concept_name atc4_concept_code atc3_concept_name atc3_concept_code atc2_concept_name atc2_concept_code atc1_concept_code drugbank_id id action uniprot_id entrez_id meddra_concept_name_4 neventreports meddra_concept_class_id_1 meddra_concept_class_id_2 meddra_concept_class_id_3 meddra_concept_class_id_4 meddra_concept_code_1 meddra_concept_code_2 meddra_concept_code_3 meddra_concept_code_4 meddra_concept_id_2 meddra_concept_id_3 meddra_concept_id_4 meddra_concept_name_1 meddra_concept_name_2 meddra_concept_name_3 relationship_id_12 relationship_id_23 relationship_id_34 soc_category pediatric_adverse_event probe sample actual prediction residual f_statistic f_pvalue ATC_concept_class_id ATC_concept_id ATC_concept_name Control MedDRA_concept_class_id MedDRA_concept_id MedDRA_concept_name condition_name control stitch_id medgen_id
7000480 21605306_35809130 21605306.0 35809130.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Flushing NaN sincalide Flushing NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN General disorders and administration site conditions NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0016382
7000481 21605306_35809130 21605306.0 35809130.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Flushing NaN sincalide Flushing NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Skin and subcutaneous tissue disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0016382
7000482 21605306_35809130 21605306.0 35809130.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Flushing NaN sincalide Flushing NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Vascular disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0016382
7000483 21605306_35809134 21605306.0 35809134.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Hyperhidrosis NaN sincalide Hyperhidrosis NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN General disorders and administration site conditions NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0038990
7000484 21605306_35809134 21605306.0 35809134.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Hyperhidrosis NaN sincalide Hyperhidrosis NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Skin and subcutaneous tissue disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0038990
7000485 21605306_35809134 21605306.0 35809134.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Hyperhidrosis NaN sincalide Hyperhidrosis NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN General disorders and administration site conditions NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0700590
7000486 21605306_35809134 21605306.0 35809134.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Hyperhidrosis NaN sincalide Hyperhidrosis NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Skin and subcutaneous tissue disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0700590
7000487 21605306_35809243 21605306.0 35809243.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Pain NaN sincalide Pain NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN General disorders and administration site conditions NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0030193
7000488 21605306_36718317 21605306.0 36718317.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Loss of consciousness NaN sincalide Loss of consciousness NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Nervous system disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0039070
7000489 21605306_37320158 21605306.0 37320158.0 NaN NaN NaN NaN NaN NaN sider NaN NaN NaN NaN NaN NaN NaN NaN NaN sincalide and Erythema NaN sincalide Erythema NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Skin and subcutaneous tissue disorders NaN NaN NaN NaN V04CC03 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN CID100032800 C0041834